Jekyll2021-06-15T23:24:11-04:00https://rwdb.xyz/feed.xmlRobert Dougherty-BlissA place, perchance, to dreamRobert Dougherty-Blissrobert.w.bliss@gmail.comAdmission to Candidacy2021-05-18T00:00:00-04:002021-05-18T00:00:00-04:00https://rwdb.xyz/admission-to-candidacy<p>On Monday, 2021 May 17, I passed my oral qualifying exam at Rutgers University.
This means that I have <em>advanced to candidacy</em>. In other words, I have cleared
the necessary hurdles to be a “real” PhD student, and am “ready” to be a
researcher.</p>
<p>Of course, this implicitly means that I was <em>incapable</em> of doing research
before passing my exams. <a href="/publications/gcf">The paper I published with my
advisor</a> didn’t count. <a href="/publications/dyck">The paper I co-authored with my
post-qual friend</a> was phoney. <a href="/publications/beukers">The collaboration my advisor
and I had with a computer algebra wizard </a> was
fraudulent. Even the <a href="/publications/az-recurrences">expository article</a> I wrote
can only be described as <em>pre-candidacy material</em><sup id="fnref:quality" role="doc-noteref"><a href="#fn:quality" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Thankfully, now that I am a “PhD candidate” as opposed to a mere “graduate
student,” the tides will turn. Indeed, with some luck, both the <em>quality</em> and
<em>quantity</em> of my work will improve because I am not <em>wasting my time on
formalities.</em></p>
<p>The purpose of a PhD program is to produce competent researchers. The <em>Big Lie</em>
about qualifying exams is that they support this purpose. In truth, qualifying
exams are mostly <em>redundant</em>. They measure things that the department already
knows. They exist <em>only</em> to place students into high-risk, low-reward
situations for long periods of time, which detracts from their opportunities to
do research.</p>
<p>I propose a modification to the qualifying exam structure, but let’s first see
how they work at Rutgers now.</p>
<p><strong>Written Qualifying Exam.</strong> This is a sequence of three 2.5 hour-long exams on
the three courses you take in your first semester. In each, you are asked five
random questions, and have to answer three correctly (on average) to pass. You
have three attempts, which will take you a full year to exhaust. The written
qualifying exam supposedly measures your technical ability in “basic”
mathematical subjects.</p>
<p>“But wait,” I hear you ask. “The exams are on the courses you took in your
first semester? Didn’t you get grades in those courses? Don’t those already
measure your ability?”</p>
<p>I am sorry to tell you that the current program mostly disregards your
coursework. “Your grades don’t matter” is a common refrain among graduate
students because it is true. Slave over every homework assignment? Go to every
office hour? Get perfect marks on every midterm? Congratulations! You have
achieved nothing. <em>One</em> exam, completed under threat of expulsion, is the
department’s <em>ultimate measure</em> of technical mastery.</p>
<p>Of course, your grade <em>is</em> a measure of your ability. You spent an entire
semester completing homework assignments, taking exams, and interacting with
your professors. The department would <em>already know</em> who puts in work and who
is “technically competent” if it cared to look.</p>
<p>I see two possibilities:</p>
<ol>
<li>
<p>The department is out-of-touch with student assessment or does not trust
professors to evaluate students.</p>
</li>
<li>
<p>The department wants to haze you.</p>
</li>
</ol>
<p>Which do you think it is?</p>
<p><strong>Oral Qualifying Exam.</strong> This is an 80-120 minute long meeting with a
committee of four faculty members. You will be questioned on a syllabus that
you wrote in consultation with your committee. In theory you have two attempts,
but it is exceedingly rare to fail. Supposedly this exam ensures that you have
not “overspecialized” before beginning research.</p>
<p>Unlike the written qualifying exam, there is no “standard” oral exam. Your
experience is highly dependent on your area and advisor. This gives it some
merit: Your syllabus is likely relevant to your proposed research, and the exam
is graded <em>subjectively</em> by your committee members. However, this subjectivity
makes the exam, again, redundant.</p>
<p>Suppose that your committee is confident that you will pass. Perhaps you’ve
been meeting with them regularly to discuss your syllabus. What, in this case,
is the purpose of the exam? Don’t they already know your knowledge? Suppose
instead that your committee is confident that you will <em>not</em> pass. Then why on
Earth are you taking the exam? In both cases, the exam merely makes you
demonstrate what the committee <em>already knows</em> in a high-pressure situation.</p>
<p>The only situation where the exam has <em>true</em> merit is when the committee does
not know if you are ready. I posit that this situation is <em>nearly nonexistant</em>.
If it were a common occurrence, then there would be a significant number of
failed oral qualifying exams. There are not.</p>
<p>The benefit of the oral exam process is having four faculty members sign off on
your skills. This has nothing to do with the <em>exam</em>, and everything to do with
how well the committee knows you.</p>
<p><strong>The Problem.</strong> Qualifying exams <em>take time that you could be doing research</em>.
If you demonstrate mastery of your first-semester courses, then you should
spend your second semester and the subsequent Summer meeting with potential
advisors. Doing anything else is a <em>waste of time</em>, as far as producing
researchers goes.</p>
<p><strong>The Solution.</strong></p>
<p>Here is a proposal:</p>
<ul>
<li>Offer a written qualifying exam to students as soon as they arrive.
<ul>
<li>Pass? Skip relevant first semester required courses.</li>
<li>Fail? Take relevant first semester required courses.</li>
</ul>
</li>
<li>First semester required courses.
<ul>
<li>A? Continue in the program.</li>
<li>Less than an A? You must pass a written qualifying exam by the end of
your first Summer.</li>
</ul>
</li>
<li>Oral exams are replaced with a requirement that four faculty members agree
that you know your syllabus sufficiently well. It is up to them to decide if
a formal exam is necessary.</li>
</ul>
<p>Nothing I’ve said seems specific to Rutgers. I assume that this silly exam
process happens at most mathematics departments. Perhaps the situation is
similar in other STEM areas, and perhaps even in the humanities. All the more
reason to evaluate the evaluators.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:quality" role="doc-endnote">
<p>I don’t think that my outlined contributions are particularly
<em>good</em>. I didn’t prove $P = NP$ or the Riemann hypothesis. But I
did do <em>research</em> in some small form. <a href="#fnref:quality" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Robert Dougherty-Blissrobert.w.bliss@gmail.comOn Monday, 2021 May 17, I passed my oral qualifying exam at Rutgers University. This means that I have advanced to candidacy. In other words, I have cleared the necessary hurdles to be a “real” PhD student, and am “ready” to be a researcher.The NSA polygraph interrogation is hypocritical and abusive2021-05-06T00:00:00-04:002021-05-06T00:00:00-04:00https://rwdb.xyz/the-nsa-polygraph-interrogation-is-hypocritical-and-abusive<blockquote>
<p>Resolve to be honest at all events; and if, in your own judgement, you cannot
be an honest lawyer, resolve to be honest without being a lawyer. Choose some
other occupation, rather than one in the choosing of which you do, in
advance, consent to be a knave.</p>
<p>— <em>Abraham Lincoln</em>, “Notes on the Practice of Law”</p>
</blockquote>
<p>I recently had the “good fortune” of being selected to participate in the
National Security Agency’s <em>Graduate Mathematics Program</em>, a supposedly
prestigious program to attract young mathematicians. I have, in disgust,
withdrawn my application.</p>
<p>I am embarrassed that I waited so long to make this decision. To repent for my
cowardice, I have resolved to never apply to any position in the intelligence
community, and hereby encourage students and colleagues to do the same.</p>
<p>The mathematical community at large has grappled with its relationship with the
NSA. See:</p>
<ul>
<li>
<p><a href="https://www.ams.org/notices/201406/rnoti-p623.pdf">“Mathematicians Discuss the Snowden
Revelations”</a>;</p>
</li>
<li>
<p>Tom Leinster’s <a href="https://golem.ph.utexas.edu/category/2015/01/the_ams_must_justify_its_suppo.html">“The AMS Must Justify Its Support of the NSA”</a>; and</p>
</li>
<li>
<p>the <em>Just Mathematics Collective</em>’s recent <a href="https://www.justmathematicscollective.net/nsa_statement.html">“Mathematics Beyond Secrecy and
Surveillance”</a>.</p>
</li>
</ul>
<p>However, my gripe is different. I am not anti-NSA. In fact, I felt quite
patriotic about devoting time to the agency. I acknowledge the ethical concerns
surrounding the NSA, but also its strategic necessity. I would have proudly
worked there.</p>
<p>My gripe is that the NSA <em>lies</em> to their employees and applicants, then
requires them to practice <em>doublethink</em> sufficiently well to ignore the lies.</p>
<p>During your security clearance process, should you be so “lucky” to reach this
stage, you will be subjected to a polygraph exam. The so-called “examiner” will
explain that they are there to “help you,” that you are both on the “same
team,” and that everything will go well if you tell them what’s on your mind.</p>
<p>These are lies. The polygraph exam is not an exam, it is an <em>interrogation</em>.
Not only is the examiner not on your side, they almost surely assume that you
are on the <em>other side</em>. They will use standard interrogation techniques to
trick you into confessing your <em>wicked ways</em>.</p>
<p>For example, during one of my polygraph exams, the following exchange occurred:</p>
<blockquote>
<p><strong>Interrogator:</strong> I would like to verify that you are an honest person by asking you
some more questions.</p>
<p><strong>Robert:</strong> Sounds good.</p>
<p><strong>I:</strong> Trustworthiness is a scale from 0 to 10. Are you a 0 or a 10?</p>
<p><strong>R:</strong> I suppose I’m pretty close to a 10.</p>
<p><strong>I:</strong> What does that mean? Are you a 9? So one out of ten times we tell you a
secret, you’ll sell it to the Russians?</p>
<p><strong>R:</strong> OK, when you put it that way, I’m a 10.</p>
<p><strong>I:</strong> Excellent. So, during the exam, I will ask you the following: <em>Have you
ever lied to cover up a mistake?</em></p>
<p><strong>R:</strong> Surely I have at some point.</p>
</blockquote>
<p>At this moment, my interrogator slammed a notepad onto the table and held the
point of his pen over it.</p>
<blockquote>
<p><strong>I:</strong> Someone who lied to cover up a mistake? That kind of person wouldn’t
work here. I’ll ask you again, and if I have to write anything down on this
paper, we’re going to have problems. <em>Have you ever lied to cover up a
mistake?</em></p>
<p><strong>R:</strong> No, I have not.</p>
</blockquote>
<p>The polygraph interrogator (Mike, or Mark, or Adam, or whatever) asked me a
question to which <em>he already knew the answer.</em> Everyone has lied to cover up a
mistake. When I answered truthfully, he responded with anger, suggesting
strongly that I should answer negatively. In other words, the interrogator
<em>wanted me to lie</em>.</p>
<p>Given that you are not a <em>stupid person</em>, you will immediately recognize this
tactic. Do you respond truthfully or lie as instructed? And once you decide,
how do you ignore that you are now playing a game? That the interrogator is
manipulating you? How do you trust <em>anything</em> from their mouth, or anything
that anyone tells you at any point in the security clearance process?</p>
<p>This is but a single moment from my polygraph experience. I sat for a total of
five exams, which totals around twenty hours of interrogation. Each of my five
interrogators proved that every hour, every minute, and every second is an
opportunity for trickery and intimidation. You will be uncomfortable, you will
be tired, and you will probably have strong emotions at some point during the
process. The interrogator does not care. They know that you feel the pressure,
and wouldn’t you feel better if you just told them what was on your mind?</p>
<p>The polygraph is a <em>charade</em>. It is based on pseudo-science, bluffing, and
deception. It does not detect lies; it elicits confessions. It seems that the
only people that would pass are those <em>too stupid</em> to recognize that they are
being toyed with, or those skilled enough to play the game convincingly. I
suspect that it keeps more smart, talented folk <em>out</em> than good, honest folk
<em>in</em>.</p>
<p>It is <em>unconscionable</em> to routinely place anyone in such a catch-22. Therefore
I have adopted a policy that I encourage everyone to follow, no matter your
opinion on the intelligence community: <strong>Refuse polygraph interrogations.</strong></p>Robert Dougherty-Blissrobert.w.bliss@gmail.comResolve to be honest at all events; and if, in your own judgement, you cannot be an honest lawyer, resolve to be honest without being a lawyer. Choose some other occupation, rather than one in the choosing of which you do, in advance, consent to be a knave. — Abraham Lincoln, “Notes on the Practice of Law”Birthday Dominoes2021-03-06T00:00:00-05:002021-03-06T00:00:00-05:00https://rwdb.xyz/birthday-dominoes<p>(<em>This post is dedicated to the most important Pisces birthday I know: EK.</em>)</p>
<p><a href="https://en.wikipedia.org/wiki/Dominoes"><em>Dominoes</em></a> is a well-known game that
no one actually knows how to play. A much more accessible game is <em>tiling
dominoes</em>: I give you a grid, and you tell me if you can cover the whole thing
with dominoes.</p>
<p>For example, look at this 2x2 grid:</p>
<p><img src="/images/2x2-dom.jpg" alt="2x2" /></p>
<p>Easy! What about this 4x4 grid?</p>
<p><img src="/images/4x4-dom.jpg" alt="4x4" /></p>
<p>Also easy! What about a 3x3 grid?</p>
<p><img src="/images/3x3-dom.jpg" alt="3x3" /></p>
<p>It can’t be done! Any grid you can cover with dominoes has to have an even
number of squares, but a 3x3 grid has 9. We lose, through no fault of our own.</p>
<p>This pretty much solves grids. You can tile them with dominoes if and only if
they have an even number of squares. But this gives us two natural questions to
ask:</p>
<ol>
<li>
<p>What if we used something other than dominoes?</p>
</li>
<li>
<p>What if we used something other than grids?</p>
</li>
</ol>
<h2 id="something-other-than-dominoes">Something other than dominoes</h2>
<p>Dominoes are pieces with two blocks “snapped” together. What if we used more
than two blocks? This these exist, of course, and we call them <em>triominoes</em>,
<em>quadominoes</em>, <em>pentominoes</em>, and in general, <em>polyominoes</em>.</p>
<p>There is only “one” domino—two blocks stuck at their ends—but there are
<em>three</em> triominoes!</p>
<p><img src="/images/triominoes.jpg" alt="triominoes" /></p>
<p>That pesky 3x3 grid that we couldn’t tile with dominoes is a cinch with
triominoes:</p>
<p><img src="/images/3x3-filled.jpg" alt="3x3-filled" /></p>
<p>The most famous polyomino is the
<a href="https://en.wikipedia.org/wiki/Pentomino"><em>pentomino</em></a>, which has five blocks
stuck together. These are commonly used for fun in brain teasers, if you’re
into that sort of thing.</p>
<h2 id="something-other-than-grids">Something other than grids</h2>
<p>The shape of the board is just as important as the number of spaces. For
example, look at this “T” with four spaces:</p>
<p><img src="/images/t.jpg" alt="t" /></p>
<p>Even though there are an even number of spaces, we can’t tile this with
dominoes! Here’s a grid with 9 spaces that we cannot tile with triominoes:</p>
<p><img src="/images/weird-9.jpg" alt="weird 9" /></p>
<p>In general, a board with $n$ spaces is only guaranteed to have a tiling of
monomioes (pieces with 1 block) and an $n$-omio (a piece with $n$ blocks), both
of which are kind of cheating. Board layout plays a big role.</p>
<h2 id="the-heart-board">The heart board</h2>
<p>I propose that we think about tiling <em>the heart board</em>, something I invented
for just this occasion. Here are the first four heart boards:</p>
<p><img src="/images/filled.png" alt="the heart boards" /></p>
<p>It’s a bit hard to make out the “grid” in these pictures, so here are the first
two drawn by hand (kinda):</p>
<p><img src="/images/hand-hearts.jpg" alt="first two hearts" /></p>
<p>The number of blocks in the first four hearts is 10, 43, 96, and 169,
respectively. The hearts follow a pattern that generalizes to arbitrary sizes.
The $n$th heart board is the set of all $(x, y)$ such that</p>
<ul>
<li>$0 \leq x < 2n$</li>
<li>$0 \leq y < 4n$</li>
<li>$x \leq y$</li>
<li>$y \leq 5n - x$</li>
<li>$y \leq x + 3n$,</li>
</ul>
<p>and also the reflection of these points about the $y$-axis.</p>
<p>The $n$th heart board has exactly</p>
\[10 n^2 + 3n - 3\]
<p>spaces in it.</p>
<p>There are lots of questions to ask about this board, but let’s settle for just
one: When can the heart board be tiled by dominoes?</p>
<p>The first heart board cannot be tiled with dominoes. That’s easy enough to see
by hand because it only has 10 blocks:</p>
<p><img src="/images/hand-1-try.jpg" alt="trying to color the first heart" /></p>
<p>The second heart board is much bigger at 43 blocks, but this is an odd size so
no tiling by dominoes could exist. In fact, the general size $10 n^2 + 3n - 3$
is odd when $n$ is even, so only the “odd” heart boards could hope to be tiled
by dominoes anyway.</p>
<p>So what about the third heart block, the one in the bottom left of the
computer-generated image above? Can it be tiled using dominoes? It turns out
that <em>no</em>, it cannot be. How do I know this? My computer told me. In fact,
using the <a href="https://github.com/jwg4/polyomino">polyomino library</a>, my computer
told me more:</p>
<table>
<thead>
<tr>
<th style="text-align: right">Heart board number</th>
<th>Domino tiling?</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">1</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">3</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">5</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">7</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">9</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">11</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">13</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">15</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">17</td>
<td>no</td>
</tr>
<tr>
<td style="text-align: right">19</td>
<td>no</td>
</tr>
</tbody>
</table>
<p>It seems like no heart boards can be tiled by dominoes. Is this true?
Amazingly, yes! Not a single heart board can be tiled by dominoes.</p>
<p>The proof of this fact relies on a very simple observation: If you color the
squares of an odd heart board in an alternating fashion, then then there are
exactly two more squares of one color than the other. For example, look at the
first heart board:</p>
<p><img src="/images/hand-color.jpg" alt="filled first heart" /></p>
<p>Every domino you put down must cover exactly one of each color. In the above
picture, every domino covers one red and one blue tile. Once you place four
dominoes, there are no blue tiles left, but two red tiles. We can’t cover those
pieces with dominoes! (This is what happened in our attempted tiling above.)</p>
<p>This pattern persists for every odd heart board. Here is a plot of the first
four hearts again, now colored in this alternating way:</p>
<p><img src="/images/filled-alt.png" alt="filled, alternating colored hearts" /></p>
<p>The first and third heart boards above (left column) have exactly two more red
squares than blue squares. The second and fourth (right column) actually have
a <em>bigger</em> difference in the number of squares, but we already knew that they
couldn’t be colored with dominoes.</p>
<p>This pattern is somewhat tricky to prove, but once you know the idea it’s just
calculations. See <a href="https://math.stackexchange.com/questions/4051519/can-you-tile-a-heart-with-dominoes">my math.SE
question</a>
for details.</p>
<p>We’ve left lots of questions on the table that would be easy to answer. For
example, what’s the proportion of squares that are red versus blue in the even
heart boards? A harder project: Let $L_n$ be a set of lines which intersect the
square $[-n, n] \times [0, n]$ and each other. When is the region “inside”
$L_n$ tileable with dominoes?</p>
<p>I don’t know the answer to any of these offhand, but they sound fun. I hope
that this delivery on my promise of math art taught everyone something new.
Here’s to much more in the future!</p>
<p><img src="/images/filled-alt-big.png" alt="big hearts" /></p>Robert Dougherty-Blissrobert.w.bliss@gmail.com(This post is dedicated to the most important Pisces birthday I know: EK.)The Central Limit Theorem from 10,000 feet2020-12-10T00:00:00-05:002020-12-10T00:00:00-05:00https://rwdb.xyz/the-central-limit-theorem-from-10-000-feet<p>The Central Limit Theorem (CLT) states, roughly, that averaging large samples
produces an approximately normal distribution. More formally, it says (in
a special case) that if $\{X_k\}$ is an iid sequence of random variables with
mean $0$ and variance $1$, then $n^{-1/2} \sum_{k = 1}^n X_k$ converges weakly
to the standard normal distribution.</p>
<p>Every undergrad in the world has probably heard “and proving the central limit
is beyond the scope of this course.” Well, here I am to show a very
<em>high-level</em> proof. Sort of. I won’t give all the technical details, because
I don’t think they matter. What’s amazing about the theorem is that it follows
from relatively “simple” asymptotics.</p>
<p>So, here’s the CLT from 10,000 feet.</p>
<p>“It turns out”<sup id="fnref:levy" role="doc-noteref"><a href="#fn:levy" class="footnote" rel="footnote">1</a></sup> that weak convergence of distributions is more-or-less
equivalent to pointwise convergence of characteristic functions. If</p>
\[L(t) = \lim_n E[e^{itX_n}]\]
<p>exists everywhere and is continuous at $0$, then $L(t)$ is a characteristic
function of some random variable $X$, and the $X_n$ converge weakly $X$. For
us, this means that $n^{-1/2} \sum_{k = 1}^n X_k$ would converge weakly to
a standard normal if we could show that</p>
\[\lim_n E[e^{it n^{-1/2} \sum_{k = 1}^n X_k}] = e^{-t^2 / 2}.\]
<p>Suppose that the $X_k$ are distributed as $X$. Independence tells us that the
characteristic function of our sum is “nice”:</p>
\[E[e^{it n^{-1/2} \sum_{k = 1}^n X_k}] = E[e^{it n^{-1/2} X}]^n.\]
<p>At this point we would love to say, “the characteristic function of $X
/ \sqrt{n}$ is (insert thing related to the characteristic function of $X$),”
but this is not true. Characteristic functions do not behave nice under scalar
multiples. There are no nice answers here. We have to do some approximations.</p>
<p>Let’s use the following asymptotic expansion of the exponential function:</p>
\[e^{it n^{-1/2} X} = 1 + \frac{it X}{\sqrt{n}} - \frac{t^2 X^2}{n} + O(n^{-3/2}).\]
<p>Taking expectations yields</p>
\[E[e^{it n^{-1/2} X}] = 1 - \frac{t^2}{2n} + O(n^{-3 / 2}),\]
<p>so the characteristic function of $n^{-1/2} \sum_{k = 1}^n X_k$ is</p>
\[(1 - \frac{t^2}{2n} + O(n^{-3 / 2}))^n.\]
<p>If we take a log and apply its asymptotic expansion, then we get</p>
\[n(-\frac{t^2}{2n} + O(n^{-3 / 2})) = -\frac{t^2}{2} + O(n^{-1/2}).\]
<p>Thus, the characteristic function of $n^{-1/2} \sum_{k = 1}^n X_k$ is</p>
\[-\frac{t^2}{2} + O(n^{-1/2}),\]
<p>which means that it goes to $t^2 / 2$ as $n \to \infty$, which is the (log)
characteristic function of the standard normal. Done!</p>
<hr />
<p>Let’s look at some steps in a little more detail. Say, 5,000 feet.</p>
<p>Other than the business about characteristic functions, the only part we
cheated was the asymptotics. I wrote</p>
\[e^{it n^{-1/2} X} = 1 + \frac{it X}{\sqrt{n}} - \frac{t^2 X^2}{n} + O(n^{-3/2}),\]
<p>dropping the implicit $X^3$ factor in the big-Oh term. This is true if we
regard $X$ as just some constant, but as soon as we take expectations in the
next step, we’re getting into fishy territory. The $X$’s that we dropped may
have been important. To really examine this, we need to know how that big-Oh
term behaves with the $X$’s back in it.</p>
<p>It turns out that the exponential function has nice remainder properties for
certain arguments. If we write</p>
\[R_n(x) = e^{ix} - \sum_{k = 0}^n \frac{(ix)^k}{k!},\]
<p>then</p>
\[|R_n(x)| \leq \min(2 |x|^n / n!,\ |x|^{n + 1} / (n + 1)!).\]
<p>This is not so hard to prove: Note that $R_n’(x) = i R_{n - 1}(x)$, so that</p>
\[R_n(x) = i \int_0^x R_{n - 1}(t)\ dt.\]
<p>Since $R_0(x) = e^{ix} - 1$ is bounded in absolute value by $\min(2, |x|)$,
induction proves the claim.</p>
<p>Putting all this together, we get</p>
\[e^{i t X} = 1 + it X - \frac{t^2 X^2}{2} + R_2(t X).\]
<p>Expectation will kill the linear term ($E[X] = 0$), and the remainder satisfies</p>
\[|R_2(t X)| \leq \min( t^2 |X|^2, |t|^3 |X|^3 / 6),\]
<p>or</p>
\[|R_2(t X)| \leq t^2 \min( X^2, |t| |X|^3 / 6).\]
<p>The minimum is bounded by the integrable $X^2$, and also goes to $0$ as $t \to
0$. Thus</p>
\[E[R_2(t X)] = o(t^2),\]
<p>meaning that dividing by $t^2$ produces something that tends to $0$ as $t \to
0$. Letting $t \mapsto t / \sqrt{n}$ gives</p>
\[E[e^{it n^{-1/2} X}] = 1 - \frac{t^2}{2n} + o(t^2 / n),\]
<p>so the full characteristic function is</p>
\[(1 - \frac{t^2}{2n} + o(t^2 / n))^n.\]
<p>Taking a log and doing the right simplifications yields a log-characteristic
function of</p>
\[-\frac{t^2}{2} + n o(t^2 / n) \to -\frac{t^2}{2}\]
<p>for all $t$. <em>Now</em> we’re done.</p>
<hr />
<p>So, back to the 10,000 foot view. What did we do?</p>
<ol>
<li>
<p>Note that the characteristic functions had a <em>kind of</em> nice form, but needed
some asymptotic analysis.</p>
</li>
<li>
<p>Do asymptotic analysis assuming that the random variables are unimportant
with respect to $n$, then go back and check this if we need to.</p>
</li>
</ol>
<p>With this in mind, let’s try to prove something else: <em>Poisson distributions
with large means are approximately normal</em>.</p>
<p>More formally, let’s prove this: Let $X_n$ be a sequence of Poisson random
variables with mean $\lambda_n$ where $\lambda_n \to \infty$. I want to show
that</p>
\[G_n = \frac{X_n - \lambda_n}{\sqrt{\lambda_n}}\]
<p>converges weakly to a standard normal distribution.</p>
<p>Fortunately, here the characteristic functions are easy to compute. Let
$\phi_X(t) = E[e^{itX}]$ be the characteristic function of a random variable
$X$. Then,</p>
\[\phi_{G_n}(t) = E[e^{itG_n}] = e^{-it\sqrt{\lambda_n}} \phi_{X_n}(t / \sqrt{\lambda_n}).\]
<p>The characteristic function for a Poisson random variable is well-known:</p>
\[\phi_{X_n}(x) = e^{\lambda_n(e^{ix} - 1)}.\]
<p>So, with $x = t / \sqrt{\lambda_n}$, we get</p>
\[\begin{align*}
\log \phi_{G_n}(t) &= -it \sqrt{\lambda_n} + \log \phi_{X_n}(x) \\
&= -it \sqrt{\lambda_n} + \lambda_n (e^{ix} - 1).
\end{align*}\]
<p>We can do some asymptotics here, since $e^{ix} - 1 = ix - x^2 / 2 + O(x^3)$. This gives</p>
\[\begin{align*}
\log \phi_{G_n}(t) &= -it \sqrt{\lambda_n} + it \sqrt{\lambda_n} - \frac{t^2}{2} + O(t^2 / \sqrt{\lambda_n}) \\
&= -\frac{t^2}{2} + O(t^2 / \sqrt{\lambda_n}).
\end{align*}\]
<p>Letting $n \to \infty$ shows that $G_n$ converges weakly to a standard normal.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:levy" role="doc-endnote">
<p>Here’s where many of the technical details are hidden. There is a lot
of functional analysis hidden in the following statement. I don’t
understand it, you don’t understand it, so let’s just drop it and be on our
way. <a href="#fnref:levy" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Robert Dougherty-Blissrobert.w.bliss@gmail.comThe Central Limit Theorem (CLT) states, roughly, that averaging large samples produces an approximately normal distribution. More formally, it says (in a special case) that if $\{X_k\}$ is an iid sequence of random variables with mean $0$ and variance $1$, then $n^{-1/2} \sum_{k = 1}^n X_k$ converges weakly to the standard normal distribution.Martingales and Pólya’s Urn2020-11-15T00:00:00-05:002020-11-15T00:00:00-05:00https://rwdb.xyz/martingales-and-p%C3%B3lya-s-urn<p>Consider an urn which begins with two balls, one white and one black. At each
time $n = 1, 2, \dots$, a ball is chosen uniformly at random and duplicated in
the urn. What happens in the long run?</p>
<p>This is an exercise in Williams’ <em>Probability with Martingales</em>, but I think it
is instructive to see how martingales can be “found” naturally when you look
for them, <em>a lá</em> the method of summation factors taught in <em>Concrete
Mathematics</em>.</p>
<p>Let $B_n$ be the number of black balls chosen by time $n$, so that $B_0 = 0$
and $B_1$ is uniformly distributed on ${0, 1}$. The proportion of balls that
are black after time $n$ is</p>
\[M_n = \frac{B_n + 1}{n + 2}.\]
<p>I have suggestively called this “$M_n$” for “Martingale.” Indeed, $M_n$ is
a martingale adapted to the sequence of sigma algebras defined by $F_n
= \sigma(A_1, \dots, A_n)$, where $A_n = [\text{a black ball was chosen at time
$n$}]$. It is not hard to check that this is true, but it seems lucky to me
that <em>exactly</em> what we want to study happens to be a martingale. What if we
were not so lucky? What if we had to <em>define</em> a martingale from scratch?</p>
<p>Let’s look for some motivation from our current problem. To check that $M_n$ is
a martingale, we will need to compute $E_n[B_{n + 1}] = E[B_{n + 1} | F_n]$, so
we may as well do that now:</p>
\[\begin{align*}
E_n[B_{n + 1}] &= B_n + E_n[[\text{black at time $n$}]] \\
&= B_n + \frac{B_n + 1}{n + 1} \\
&= (1 + 1 / (n + 1)) B_n + \frac{1}{n + 1}.
\end{align*}\]
<p>Okay, we now know that $E_n[B_{n + 1}]$ is related to $B_n$ by some type of
equation. Can we see how to define a martingale from here? <em>Yes!</em></p>
<p>In general, suppose that the sequence of random variables $B_n$ satisfies</p>
\[E_n[B_{n + 1}] = a_n B_n + b_n\]
<p>for some constant sequences $a_n$ and $b_n$. Let’s try to <em>make</em> a martingale
by defining $M_n = c_n B_n + d_n$ for some to-be-determined constant sequences
$c_n$ and $d_n$. To make $M_n$ a martingale, we need</p>
\[E_n[M_{n + 1}] = c_{n + 1} (a_n B_n + b_n) + d_{n + 1} = c_n B_n + d_n = M_n.\]
<p>It would suffice to choose $c_n$ such that</p>
\[c_{n + 1} a_n = c_n\]
<p>and $d_n$ such that</p>
\[c_{n + 1} b_n + d_{n + 1} = d_n.\]
<p>But these conditions are easy to satisfy! They give, say,</p>
\[c_{n + 1} = c_1 \prod_{k = 1}^n a_k^{-1}\]
<p>and</p>
\[d_{n + 1} = d_1 - \sum_{k = 1}^n c_{k + 1} b_k.\]
<p>In the case of our exercise, we have $a_n = 1 + 1 / (n + 1)$ and $b_n = 1 / (n + 1)$. If we solve the implied recurrences (being thankful that the sum which occurs in $d_{n + 1}$ telescopes), then we get precisely</p>
\[c_n = d_n = \frac{1}{n + 2},\]
<p>so the $M_n$ we would define is</p>
\[M_n = \frac{1}{n + 2} B_n + \frac{1}{n + 2} = \frac{B_n + 1}{n + 2}.\]
<p>Magic!</p>
<hr />
<p>Now that we have this martingale, what does it get us? Well, immediately it
tells us that</p>
\[E[M_n] = E[M_0] = \frac{1}{2},\]
<p>so we should always “expect” there to be a pretty even mix of balls in the urn.</p>
<p>What else? Well, this is a nonnegative (bounded!) martingale, so it has a limit
almost surely. Call that limit $M_\infty = \lim M_n$. What does this limit look
like?</p>
<p>Let’s take a moment generating function approach. Perhaps we can work out
$E[\exp(M_n z)]$, and then take a limit to get $E[\exp(M_\infty z)]$.</p>
<p>Exercise: If $B_n$ is as defined before (the number of black balls chosen by
time $n$), then $B_n$ is uniformly distributed on ${1, 2, \dots, n + 1}$.
(Hint: Work out a recurrence using the law of total probability.)</p>
<p>Using the above exercise, we can write</p>
\[E[\exp(M_n z)] = \sum_{k = 1}^{n + 1} \exp\left( \frac{(k + 1) z}{n + 2} \right) \frac{1}{n + 1}.\]
<p>This is just a geometric sum, a task fit for a computer. Maple readily spits
out</p>
\[E[\exp(M_n z)] = \frac{1}{n + 1} \frac{e^{(n + 4) z / (n + 2)} - e^{2z / (n + 2)}}{e^{z / (n + 2)} - 1}.\]
<p>Maple can also take the limit for us (but it is not so hard by hand in this
case) to show that</p>
\[\lim_n E[\exp(M_n z)] = \frac{e^z - 1}{z}.\]
<p>Well! This is intensely interesting. The function on the right is the moment
generating function for the <em>uniform distribution on $(0, 1)$!</em> Could this be
true? Could the limit of a discrete process be a continuous one? Look at these
graphs:</p>
<p><img src="/images/polya.png" alt="Urn histogram" />
<img src="/images/polya-1.png" alt="Urn paths" /></p>
<p>Why, it <em>must</em> be true! The graphs confirm it!</p>
<p>To assuage the hardened hearts of doubters, we should justify that</p>
\[\lim_n E[\exp(M_n z)] = E[\exp(M_\infty z)].\]
<p>This is not hard:</p>
\[|\exp(M_n z)| = \exp(M_n \operatorname{Re} z) \leq e\]
<p>for $|z| < 1$, say. So we can apply the dominated convergence theorem to get
our moment generating function</p>
\[E[\exp(M_\infty z)] = \frac{e^z - 1}{z},\]
<p>which shows once and for all that $M_\infty$ is in fact uniform on $(0, 1)$.
Wow!</p>
<hr />
<p>I’ve been thinking a lot about martingales for a probability class I’m taking
this semester. I still don’t quite <em>get</em> them, but I’m trying to do more and
more exercises to understand why we care about them. I <em>have</em> seen some good
examples so far, but I’m just now getting to writing down some of my favorites.</p>Robert Dougherty-Blissrobert.w.bliss@gmail.comConsider an urn which begins with two balls, one white and one black. At each time $n = 1, 2, \dots$, a ball is chosen uniformly at random and duplicated in the urn. What happens in the long run?An experimental approach to the drunk passenger problem2020-07-24T00:00:00-04:002020-07-24T00:00:00-04:00https://rwdb.xyz/the-drunk-passenger-problem<p>An apparently common question in
<a href="https://en.wikipedia.org/wiki/Quantitative_analysis_(finance)">quant</a>
interviews is as follows:</p>
<blockquote>
<p>Suppose that 100 passengers are boarding an airplane, all with assigned
seats. The first passenger is drunk, and sits in a random seat. Each following
passenger sits in their assigned seat, if it is available, and in a random seat
otherwise. What is the probability that the last passenger sits in their
assigned seat?</p>
</blockquote>
<p>I just wanted to quickly write up what I thought were four nifty proofs,
including one “experimental.”</p>
<p>First, you have to observe the fundamental recurrence. Let $p(n)$ be the
probability that the last passenger sits in the correct seat given that there
are $n$ passengers. It’s easy to check that $p(1) = 1$ and $p(2) = p(3) = 1/2$.</p>
<p>Suppose that there are $n$ passengers. If the drunk sits in his seat, then the
probability of success is $1$. If the drunk sits in the $k$th person’s seat
with $k > 1$, then the problem has been reduced to a drunk passenger with
a total of $n - k + 1$ passengers. (After the drunk sits, passengers $2$
through $k - 1$ sit, then passenger $k$ is so irate that they drink themselves
into a stupor.) Therefore,</p>
\[p(n) = \frac{1}{n} \sum_{1 \leq k < n} p(k).\]
<p>We now have a few ways to go about this.</p>
<h2 id="guess-and-check">Guess-and-check</h2>
<p>It’s easy to see that $p(2) = p(3) = 1 / 2$, so maybe the sequence is just
constant after $n = 2$? In principle we would want to compute a few more terms,
but without a computer this gets hard to do in your head. We can jump straight
to induction:</p>
\[\sum_{1 \leq k < n + 1} p(k) = 1 + \frac{n - 1}{2} = \frac{n + 1}{2}.\]
<p>Dividing by $n + 1$ gives $1 / 2$, so $p(n)$ is constant beyond $n \geq 2$.</p>
<h2 id="smarter-recurrences">Smarter recurrences</h2>
<p>There’s a standard trick to get rid of summations in recurrences, and in this
case it works exceptionally well. We can replace $n$ with $n + 1$ in the
fundamental recurrence, giving the following two equations:</p>
\[\begin{align*}
p(n) &= \frac{1}{n} \sum_{1 \leq k < n} p(k) \\
p(n + 1) &= \frac{1}{n + 1} \sum_{1 \leq k < n + 1} p(k).
\end{align*}\]
<p>Having the denominators on the right-hand side is a little awkward, so let’s
multiply them to the other side, then subtract the two equations:</p>
\[(n + 1)p(n + 1) - n p(n) = p(n).\]
<p>Simplifying yields</p>
\[(n + 1)(p(n + 1) - p(n)) = 0,\]
<p>and since $n$ is positive, we see that $p(n)$ is constant for $n \geq 2$.</p>
<h2 id="true-guess-and-check"><em>True</em> guess-and-check</h2>
<p>The previous method works so well because $p(n + 1) = p(n)$ falls out, and the
first guess-and-check method works because the induction is easy. However,
I claim that the <em>true</em> guess and check method is even easier!</p>
<p>We know, <em>a priori</em>, that the sequence $p(n)$ defined by $p(2) = 1 / 2$ and</p>
\[p(n) = \frac{1}{n} \left( 1 + \sum_{2 \leq k < n} p(k) \right)\]
<p>satisfies a linear recurrence with polynomial coefficients. (We just showed how
to prove such a thing in the previous method. Even without getting supremely
lucky, the result still holds.) Therefore, since $p(n) = 1 / 2$ for the first
dozen terms or so (exercise!), it follows<sup id="fnref:follows" role="doc-noteref"><a href="#fn:follows" class="footnote" rel="footnote">1</a></sup> that $p(n) = 1 / 2$ for all
$n \geq 2$.</p>
<h2 id="generating-functions">Generating functions</h2>
<p>This is an almost completely different way to do it.</p>
<p>Let $P(z) = \sum_{k \geq 2} p(k) z^k$. The fundamental recurrence can be
rewritten as</p>
\[(n + 1) p(n) = 1 + \sum_{2 \leq k \leq n} p(k)\]
<p>for $n \geq 2$, and it is an easy application of well-known
generatingfunctionology that this implies</p>
\[zP'(z) + P(z) = \frac{z^2}{1 - z} + \frac{P(z)}{1 - z}.\]
<p>This is a linear differential equation, with general solution given by</p>
\[P(z) = \frac{c + z^2 / 2}{1 - z}\]
<p>for some constant $c$. It is easy to check that $P’‘(0) = 1$, and this gives $c
= 0$. Thus</p>
\[P(z) = \frac{z^2}{2(1 - z)} = \frac{z^2}{2} + \frac{z^3}{2} + \frac{z^4}{2}
+ \cdots.\]
<p>We should note that this approach is almost entirely mechanical. Computers can
derive the differential equation and then solve it, almost unaided by humans.</p>
<h2 id="conclusions">Conclusions</h2>
<p>I don’t know what this problem has to do with the stock market, but it sure is
fun!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:follows" role="doc-endnote">
<p>To be completely rigorous, we need to say, “and the order of the
recurrence is no larger than $K$,” and then check $K$ terms, but at a glance we
know that $K$ is certainly no more than ten, or whatever. <a href="#fnref:follows" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Robert Dougherty-Blissrobert.w.bliss@gmail.comAn apparently common question in quant interviews is as follows:A small binomial sum2020-06-24T00:00:00-04:002020-06-24T00:00:00-04:00https://rwdb.xyz/a-small-binomial-sum<p>I recently came across an interesting binomial sum in <a href="https://math.stackexchange.com/questions/3729998">this Math.SE
question</a>. As usual,
Zeilberger’s algorithm makes short work of it, but the “human” approach reveals
a small miracle.</p>
<p>We want to evaluate</p>
\[R_n(z) = \sum_k (-1)^k {2k \choose k} {k \choose n - k} z^k.\]
<p>Our first step is to use the identity ${2k \choose k} = (-4)^k {-1/2 \choose
k}$ and write</p>
\[R_n(z) = \sum_k {-1/2 \choose k} {k \choose n - k} (4z)^k.\]
<p>There might be a nice, direct way to evaluate this sum, but I don’t know it.
Instead, let’s apply the <em>snake oil method</em>. Let $R(x) = \sum_{n \geq 0} R_n(z)
x^k$. By expanding $R_n(z)$ and interchanging the order of summation, it is
easy to show that</p>
\[R(x) = \frac{1}{\sqrt{1 + 4zx(1 + x)}}.\]
<p>The radicand in the denominator is a quadratic polynomial in $x$ with
discriminant $16z(z - 1)$. The square root disappears precisely when $z = 0$ or
$z = 1$! If $z = 0$ the sum is trivially $1$, and if $z = 1$ we get</p>
\[R(x) = \frac{1}{1 + 2x},\]
<p>so</p>
\[R_n(1) = \sum_k 4^k {-1/2 \choose k} {k \choose n - k} = (-2)^n.\]
<p>In other words, the sum given has a nice, exponential solution in <em>only</em> the
case the Math.SE question asked. Remarkable!</p>
<h2 id="reflections">Reflections</h2>
<p>Zeilberger’s produces a recurrence for $R_n(z)$ straight away. Running
<code class="language-plaintext highlighter-rouge">ZeilbergerRecurrence(f, n, k, R, 0..n) assuming n::posint</code> where $f$ is the
summand of $R_n(z)$ yields</p>
\[(n + 2) R_{n + 2}(z) + 2z(2n + 3) R_{n + 1}(z) + 4z(n + 1) R_n(z) = 0.\]
<p>So, in general, $R_n(z)$ is holonomic in $n$ and needs no more than three terms
in its defining recurrence. Of course, if $z = 1$, then <code class="language-plaintext highlighter-rouge">ZeilbergerRecurrence</code>
instead yields</p>
\[F(n + 1) + 2 F(n) = 0,\]
<p>so it sometimes simplifies. Maple fails to solve the general recurrence when
$z$ is symbolic.</p>
<p>If we were good enough with complex functions, then we could work out
asymptotics, but in any event this is another avenue to explore.</p>Robert Dougherty-Blissrobert.w.bliss@gmail.comI recently came across an interesting binomial sum in this Math.SE question. As usual, Zeilberger’s algorithm makes short work of it, but the “human” approach reveals a small miracle.An ansatz approach to the zeta function2020-05-08T00:00:00-04:002020-05-08T00:00:00-04:00https://rwdb.xyz/an-ansatz-approach-to-the-zeta-function<p>The evaluation of Riemann’s zeta function</p>
\[\zeta(s) = \sum_{k \geq 1} \frac{1}{k^s}\]
<p>at even integers is well-known, going back to the venerable Leonhard Euler and
his answer to the <a href="https://en.wikipedia.org/wiki/Basel_problem">Basel problem</a>,
namely</p>
\[\zeta(2) = \frac{\pi^2}{6}.\]
<p>The Basel problem stumped even the great Bernoulli’s, and Euler’s solution
required the ingenuity and insight found throughout his work. Nearly
three-hundred years later, is this easier to answer? I claim that not only is
it easy, but that it is <em>entirely routine</em> to both <em>guess</em> the answer and to
<em>prove</em> it, provided that you allow some basic Fourier analysis. Let’s see how.</p>
<p>Given a suitably nice function $f$ on $[0, 1]$, we define its <em>Fourier
transform</em> by</p>
\[\hat{f}(t) = \int_0^1 f(x) e^{-2\pi i t x}\ dx.\]
<p>These are the coefficients of the Fourier series</p>
\[\sum_k \hat{f}(k) e^{2\pi i k}.\]
<p>There is a deep theory about what functions are suitably nice and what
properties the Fourier transform satisfies, but for now we are only interested
in one fact:</p>
<blockquote>
<p>The family $\{e^{2\pi i n x}\}_n = \{e(n)\}_n$ is an orthonormal basis for
the square-integrable functions on $[0, 1]$ equipped with the usual integral
inner product</p>
\[(f, g) = \int_0^1 f \overline{g}.\]
<p>In particular,</p>
\[||f||_2 = \sum_k |(f, e(n))|^2 = \sum_k |\hat{f}(k)|^2.\]
</blockquote>
<p>In one direction, this tells us that sums whose terms are Fourier coefficients
can be evaluated as an integral. This is exactly what we will do.</p>
<p>First, we need a humanly-proved lemma (though in principle a computer could
likely figure this out).</p>
<p><strong>Lemma.</strong> <em>The Fourier transform of $f_n(x) = x^n$ at integers satisfies
$\hat{f_n}(0) = (n + 1)^{-1}$, and $\hat{f}_n(k)$ is a polynomial in $(2\pi
i k)^{-1}$ of degree $n$ for all nonzero integers $k$.</em></p>
<p><strong>Proof.</strong> Apply integration by parts to prove that the sequence $\hat{f}_n(k)$
satisfies</p>
\[\hat{f}_n(k) = n(2\pi i k)^{-1} \hat{f}_{n - 1}(k) - (2\pi i k)^{-1}.\]
<p>The claim follows immediately by induction once we note that $\hat{f}_0(k)
= 0$. $\blacksquare$</p>
<p>At this point we are in possession of a very powerful <em>ansatz</em>. The Fourier
transform of polynomials at integers gives us reciprocals of powers of
integers! There must be a connection with the zeta function. In particular, we
want to find a polynomial</p>
\[f(x) = \sum_{k = 0}^n a_k x^k\]
<p>such that $\hat{f}(k)$ is some multiple of $k^{-n}$ times a factor independent
of $k$. Given the lemma above, we know that $\hat{f}(k)$, for nonzero $k$, is
a polynomial in $(2\pi i)^{-1}$, meaning that we can look at their coefficients
and require ones smaller than $(2\pi i)^{-n}$ to vanish. We can stipulate that
$a_n = -1$ and that $\hat{f}(0) = 0$, which gives us a total of $n + 1$ linear
equations in $n + 1$ unknowns, which we can (probably) solve!</p>
<p>So, in light of this, it suffices to write a program that will equate these
coefficients and solve the resulting linear equations. This will then, <em>a
posteriori</em>, provide evaluations of $\zeta(n)$ for positive, even integers $n$.
(Only the evens since, of course, we must take squares.)</p>
<h1 id="the-program">The program</h1>
<p>Here is one such program that will do the job.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">sympy</span> <span class="k">as</span> <span class="n">sp</span>
<span class="kn">from</span> <span class="nn">sympy.abc</span> <span class="kn">import</span> <span class="n">x</span><span class="p">,</span> <span class="n">t</span>
<span class="kn">from</span> <span class="nn">sympy</span> <span class="kn">import</span> <span class="n">I</span><span class="p">,</span> <span class="n">pi</span>
<span class="k">def</span> <span class="nf">zeta_ansatz</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">sp</span><span class="p">.</span><span class="n">symbols</span><span class="p">(</span><span class="s">"a:{}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">f</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">xs</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">x</span><span class="o">**</span><span class="n">k</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">k</span> <span class="o">=</span> <span class="n">sp</span><span class="p">.</span><span class="n">symbols</span><span class="p">(</span><span class="s">"k"</span><span class="p">,</span> <span class="n">integer</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">zero</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">ft</span> <span class="o">=</span> <span class="n">sp</span><span class="p">.</span><span class="n">integrate</span><span class="p">(</span><span class="n">f</span> <span class="o">*</span> <span class="n">sp</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">pi</span> <span class="o">*</span> <span class="n">I</span> <span class="o">*</span> <span class="n">k</span> <span class="o">*</span> <span class="n">x</span><span class="p">),</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">ft</span> <span class="o">=</span> <span class="n">ft</span><span class="p">.</span><span class="n">simplify</span><span class="p">().</span><span class="n">expand</span><span class="p">()</span>
<span class="c1"># Replace 2 pi I with a dummy variable 1 / t to grab coefficients.
</span> <span class="n">ft</span> <span class="o">=</span> <span class="n">ft</span><span class="p">.</span><span class="n">subs</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="mi">1</span><span class="p">).</span><span class="n">subs</span><span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="mi">1</span><span class="p">).</span><span class="n">subs</span><span class="p">(</span><span class="n">pi</span><span class="p">,</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">t</span><span class="p">)</span>
<span class="n">coeffs</span> <span class="o">=</span> <span class="n">ft</span><span class="p">.</span><span class="n">collect</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">evaluate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">zero_eqns</span> <span class="o">=</span> <span class="p">[</span><span class="n">coeff</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">coeff</span> <span class="ow">in</span> <span class="n">coeffs</span><span class="p">.</span><span class="n">items</span><span class="p">()</span> <span class="k">if</span> <span class="n">key</span> <span class="o">!=</span> <span class="n">t</span><span class="o">**</span><span class="n">n</span><span class="p">]</span>
<span class="n">eqns</span> <span class="o">=</span> <span class="n">zero_eqns</span> <span class="o">+</span> <span class="p">[</span><span class="n">xs</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="nb">sum</span><span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="p">(</span><span class="n">k</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">xs</span><span class="p">))]</span>
<span class="n">soln</span> <span class="o">=</span> <span class="n">sp</span><span class="p">.</span><span class="n">solve</span><span class="p">(</span><span class="n">eqns</span><span class="p">,</span> <span class="n">xs</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">soln</span><span class="p">[</span><span class="n">xs</span><span class="p">[</span><span class="n">k</span><span class="p">]]</span> <span class="o">*</span> <span class="n">x</span><span class="o">**</span><span class="n">k</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">coeff</span> <span class="o">=</span> <span class="n">sp</span><span class="p">.</span><span class="n">factorial</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">pi</span><span class="p">)</span><span class="o">**</span><span class="n">n</span>
<span class="k">return</span> <span class="n">f</span><span class="p">,</span> <span class="n">sp</span><span class="p">.</span><span class="n">integrate</span><span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">f</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="o">/</span> <span class="n">coeff</span><span class="o">**</span><span class="mi">2</span> <span class="o">/</span> <span class="mi">2</span>
</code></pre></div></div>
<p>An example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="n">time</span> <span class="p">[</span><span class="n">zeta_ansatz</span><span class="p">(</span><span class="n">k</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)]</span>
<span class="n">CPU</span> <span class="n">times</span><span class="p">:</span> <span class="n">user</span> <span class="mf">1.85</span> <span class="n">s</span><span class="p">,</span> <span class="n">sys</span><span class="p">:</span> <span class="mi">19</span> <span class="n">µs</span><span class="p">,</span> <span class="n">total</span><span class="p">:</span> <span class="mf">1.85</span> <span class="n">s</span>
<span class="n">Wall</span> <span class="n">time</span><span class="p">:</span> <span class="mf">1.85</span> <span class="n">s</span>
<span class="n">Out</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
<span class="err">⎡⎛</span> <span class="mi">2</span><span class="err">⎞</span> <span class="err">⎛</span> <span class="mi">4</span><span class="err">⎞</span> <span class="err">⎛</span> <span class="mi">2</span> <span class="mi">6</span><span class="err">⎞</span> <span class="err">⎛</span> <span class="mi">8</span> <span class="err">⎞⎤</span>
<span class="err">⎢⎜</span> <span class="n">π</span> <span class="err">⎟</span> <span class="err">⎜</span> <span class="mi">2</span> <span class="mi">1</span> <span class="n">π</span> <span class="err">⎟</span> <span class="err">⎜</span> <span class="mi">3</span> <span class="mi">3</span><span class="err">⋅</span><span class="n">x</span> <span class="n">x</span> <span class="n">π</span> <span class="err">⎟</span> <span class="err">⎜</span> <span class="mi">4</span> <span class="mi">3</span> <span class="mi">2</span> <span class="mi">1</span> <span class="n">π</span> <span class="err">⎟⎥</span>
<span class="err">⎢⎜</span><span class="mi">1</span><span class="o">/</span><span class="mi">2</span> <span class="o">-</span> <span class="n">x</span><span class="p">,</span> <span class="err">──⎟</span><span class="p">,</span> <span class="err">⎜</span><span class="o">-</span> <span class="n">x</span> <span class="o">+</span> <span class="n">x</span> <span class="o">-</span> <span class="err">─</span><span class="p">,</span> <span class="err">──⎟</span><span class="p">,</span> <span class="err">⎜</span><span class="o">-</span> <span class="n">x</span> <span class="o">+</span> <span class="err">────</span> <span class="o">-</span> <span class="err">─</span><span class="p">,</span> <span class="err">───⎟</span><span class="p">,</span> <span class="err">⎜</span><span class="o">-</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">2</span><span class="err">⋅</span><span class="n">x</span> <span class="o">-</span> <span class="n">x</span> <span class="o">+</span> <span class="err">──</span><span class="p">,</span> <span class="err">────⎟⎥</span>
<span class="err">⎣⎝</span> <span class="mi">6</span> <span class="err">⎠</span> <span class="err">⎝</span> <span class="mi">6</span> <span class="mi">90</span><span class="err">⎠</span> <span class="err">⎝</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">945</span><span class="err">⎠</span> <span class="err">⎝</span> <span class="mi">30</span> <span class="mi">9450</span><span class="err">⎠⎦</span>
</code></pre></div></div>
<p>So there, we have answered the Basel problem plus evaluated the next three
terms of the sequence $\zeta(2n)$, all in under two seconds and without any
foreknowledge of the answer. Not too shabby, eh, Euler?</p>
<h1 id="connections-to-known-results">Connections to known results</h1>
<p>These results are obviously not new. There are known closed-form evaluations of
$\zeta(2n)$ for all positive integers $n$. A cursory glance suggests that the
polynomials we get are exactly the negatives of the <a href="https://en.wikipedia.org/wiki/Bernoulli_polynomials#Representations">Bernoulli
polynomials</a>
$B_n(x)$, and mentioned in that article is that the Fourier transform of
$B_n(x)$ is</p>
\[\hat{B}_n(x) = -\frac{n!}{(2\pi i)^n} \sum_{k \neq 0} \frac{e^{2\pi i k x}}{k^n},\]
<p>which matches exactly what we have said here.</p>
<p>Does this idea uniquely define the Bernoulli polynomials? The Fourier transform
is invertible, so saying “let $B_n(x)$ be the preimage of such and such
function under the Fourier transform on $[0, 1]$” is a fine definition. It is
surprising that such a thing is a <em>polynomial</em>, but nevertheless true, and
lucky for us that it was. Conversely, I am pretty sure that polynomials will
only ever give you linear combinations of the $\zeta$ function evaluated at
even integers, so we have completely exhausted the usefulness of polynomials
coupled with Fourier transforms.</p>Robert Dougherty-Blissrobert.w.bliss@gmail.comThe evaluation of Riemann’s zeta functionAlgebraic and analytic irrationality proofs2020-03-07T00:00:00-05:002020-03-07T00:00:00-05:00https://rwdb.xyz/three-irrationality-proofs<p>At the behest of mathematicians wiser than myself, I have been thinking about
number theory generally and irrationality proofs in particular. I think that
there are two <em>flavors</em> of irrationality proofs available to us: algebraic and
analytic. Here I will sketch three irrationality proofs which naturally present
clear examples of these two flavors.</p>
<p>Roughly speaking, <em>analytic</em> proofs of irrationality construct infinite
approximations of a number which are “too good,” while <em>algebraic</em> proofs use
only number-theoretic tools, such as primality. This is not such a clear
definition. We shall prove the following claim in three ways to demonstrate
this:</p>
<p><strong>Proposition.</strong> $\sqrt{a^2 + 4}$ is irrational for all nonzero integers $a$.</p>
<p><strong>Algebraic Proof 1.</strong> Suppose that $a^2 + 4 = b^2$ for some integer $b$.
Without loss of generality suppose that $a$ and $b$ are nonnegative. Since $4
= (b - a)(b + a)$, we have only three possible cases:</p>
<ol>
<li>
<p>$b - a = 4$ and $b + a = 1$. This implies $2b = 5$, which is impossible.</p>
</li>
<li>
<p>$b - a = 1$ and $b + a = 4$. This also implies $2b = 5$, which is still
impossible.</p>
</li>
<li>
<p>$b - a = b + a = 2$. Then $b = 2$ and $a = 0$.</p>
</li>
</ol>
<p>Therefore $(a, b) = (0, 2)$ is the only integer solution. $\blacksquare$</p>
<p><strong>Algebraic Proof 2.</strong> Without loss of generality, suppose that $a$ and $b$ are
nonnegative. If $a = 0$ then $b = 2$, so suppose that $a \geq 1$ (and $b \geq
1$ follows). If $a = 1$ then $b^2 = 5$, which is impossible. Thus suppose $a
\geq 2$. We clearly have $b > a$, so $b^2 \geq (a + 1)^2 > a^2 + 4$. This shows
that $(a, b) = (0, 2)$ is the only integer solution. $\blacksquare$</p>
<p>The analytic proof requires a bit of work.</p>
<p><strong>Lemma.</strong> For every nonzero integer $a$, the polynomial $1 - az - z^2$ has
distinct nonzero roots, one inside the unit circle, and one outside.</p>
<p><strong>Proof.</strong> The discriminant is $\sqrt{a^2 + 4} \neq 0$, so there are two roots.
The minus root which comes from the quadratic formula lies inside the unit
circle, and the fact that the roots multiply to $1$ in absolute value shows
that the other is outside the unit circle. $\blacksquare$</p>
<p><strong>Lemma.</strong> Let $p_n$ and $q_n$ be the numerator and denominator, respectively,
of the $n$th convergent to the infinite continued fraction $[a, a, \dots]$.
Then</p>
\[\frac{p_n}{q_n} = (a + r) (1 + O(\epsilon)^n),\]
<p>where $r$ is the root inside the unit circle of $1 - az - z^2$ and $0
< \epsilon < 1$.</p>
<p>I won’t prove this lemma, but here is how a proof would go: Derive generating
functions for $p_n$ and $q_n$ from their well-known recurrences. These are
rational and give us a closed form. You can factor out a $1 / r$ times some
constants from both closed forms, and dividing them gives the asymptotic
expansion above with $\epsilon < 1$ since $r$ is the smaller root.</p>
<p>(I am too lazy to type this proof, but it really is a routine computation.)</p>
<p><strong>Analytic conclusion.</strong> By the above lemmas, the continued fraction $[a, a,
\dots]$ equals $a + r$ when $a \neq 0$, so this number is irrational since it
has an infinite continued fraction. By the quadratic formula $a + r$ is
a rational linear combination of $1$ and $\sqrt{a^2 + 4}$, so it is irrational
iff $\sqrt{a^2 + 4}$ is irrational.</p>
<p>Is either method better? I’m not sure. The algebraic proofs are shorter here,
but the analytic proof is pretty interesting. I really like generating
functions, so I’m biased. It seems good to know both.</p>Robert Dougherty-Blissrobert.w.bliss@gmail.comAt the behest of mathematicians wiser than myself, I have been thinking about number theory generally and irrationality proofs in particular. I think that there are two flavors of irrationality proofs available to us: algebraic and analytic. Here I will sketch three irrationality proofs which naturally present clear examples of these two flavors.Padé approximants again2020-02-15T00:00:00-05:002020-02-15T00:00:00-05:00https://rwdb.xyz/pade-approximants-again<p>An $(n, m)$th <em>Padé approximant</em> for a generating function $f(x)$ is a pair of
polynomials $(P_{n, m}(x), Q_{n, m}(x))$ such that $P_{n, m}(x)$ and $Q_{n,
m}(x)$ have degrees not exceeding $n$ and $m$, respectively, and</p>
\[\begin{equation}
\label{pade}
\tag{Padé}
f(x) Q_{n, m}(x) - P_{n, m}(x) = O(x^{n + m + 1}),
\end{equation}\]
<p>where $O(x^k)$ stands for any generating function which is zero before the
$k$th term (and possibly later).</p>
<p>Given $n$ and $m$, we write the polynomials $P_{n, m}(x)$ and $Q_{n, m}(x)$
explicitly as</p>
\[\begin{align*}
P_{n, m}(x) &= \sum_{k = 0}^n a(n, m, k) x^k \\
Q_{n, m}(x) &= \sum_{k = 0}^m b(n, m, k) x^k.
\end{align*}\]
<p>We assume that $a(n, m, k) = 0$ if $k > n$ or $k < 0$, and that $b(n, m, k)
= 0$ if $k < 0$ or $k > m$. Such coefficients are guaranteed to exist and, for
a fixed $f(x)$, are unique up to a constant multiple<sup id="fnref:pade-ref" role="doc-noteref"><a href="#fn:pade-ref" class="footnote" rel="footnote">1</a></sup>. It is common
to take $b(n, m, 0) = 1$ just so there is a fixed starting point.</p>
<p>In a <a href="/pad%C3%A9-approximations/">previous post</a> I derived a closed form for
Padé approximants of the function $(1 - 4x)^{-1/2}$. Here, I want to show that
we can generalize this by choosing an arbitrary exponent.</p>
<p><strong><em>Highlights:</em></strong></p>
<ul>
<li>The Padé approximants of the function $f(x) = (1 + tx)^c$ are uniquely
determined up to a constant multiple<sup id="fnref:constant" role="doc-noteref"><a href="#fn:constant" class="footnote" rel="footnote">2</a></sup>. Choosing the constant to be
$1$, the coefficients are as follows:</li>
</ul>
\[\begin{align*}
a(n, m, k) &= t^k {n \choose k} {m + c \choose k} {n + m \choose k}^{-1} \\
b(n, m, k) &= t^k {m \choose k} {n - c \choose k} {n + m \choose k}^{-1}.
\end{align*}\]
<ul>
<li>This is proven by establishing (automatically!) the identity</li>
</ul>
\[\sum_{j = 0}^k
{m \choose j} {n - c \choose j} {c \choose k - j} {n + m \choose j}^{-1}
= {n \choose k} {m + c \choose k} {n + m \choose k}^{-1}.\]
<p><strong><em>Differences between previous post and this one:</em></strong></p>
<ul>
<li>
<p>In my last post, I considered only diagonal approximants. Also, I defined
$a(n, k)$ to be the coefficient on $x^{n - k}$ rather than $x^k$. This seemed
useful at the time, but <a href="Z">Dr. Z</a> has shown me the error of my ways.</p>
</li>
<li>
<p>My last post tried to simplify the coefficients by taking $a(n, 0) = 1$. The
<em>smart</em> thing is to realize that it doesn’t really matter.</p>
</li>
</ul>
<p>More on these two points later. For now, let’s get to the good stuff.</p>
<h1 id="padé-approximants-in-detail">Padé approximants in detail</h1>
<p>From an analytic perspective, the point of a Padé approximant as defined in
\eqref{pade} is that the power series of the rational function $P_{n, m}(x)
/ Q_{n, m}(x)$ and $f(x)$ ought to agree for the first $n + m$ terms. The form
in \eqref{pade} is slightly more general, and suggests how to actual compute
the coefficients.</p>
<p>Looking at the coefficient on $x^k$ for $0 \leq k \leq n + m$ in \eqref{pade}
shows that necessary and sufficient conditions to be the coefficients of a Padé
approximant are</p>
\[\begin{equation}
\label{convolution}
\sum_j b(n, m, j) c(k - j) = a(n, m, k), \qquad 0 \leq k \leq n + m,
\end{equation}\]
<p>and the boundary conditions on $a(n, m, k)$ and $b(n, m, k)$. (These are
implicitly assumed in the sum since it ranges over <em>all</em> integers $j$.)</p>
<p>The boundary conditions ensure that the coefficients are unique and that we
don’t need to go beyond the degrees $n$ and $m$ for our polynomials. For
example, lots of sequences satisfy \eqref{convolution}. In fact, for any fixed
$b(n, m, k)$ and $c(k)$, we could just <em>define</em> $a(n, m, k)$ by
\eqref{convolution}. (Equivalently, just pick a generating function $Q(x)$ and
define another one called $P(x) = f(x) Q(x)$.) This gets you the formal
identity, but with possibly infinite generating functions rather than proper
polynomials.</p>
<p>As an example, take $f(x) = (1 + x)^{-1/2}$ and $b(n, m, k) = (n + m) / 2^k$.
Computing the “$(4, 4)$th Padé” approximant by defining $a(n, m, k)$ by
\eqref{convolution} yields the rational function</p>
\[\frac{27x^4 - 16x^3 + 48x^2 + 128}{8x^4 + 16x^3 + 32x^2 + 64x + 128}.\]
<p>However, the series expansion of this rational function disagrees with that of
$(1 + x)^{-1/2}$ at the fifth term, when it ought to agree until at least the
eighth!</p>
<p>To learn more about Padé approximants than you would ever care to know, see
Chapter 20 and on of Wall’s <em>Analytic Theory of Continued Fractions</em>.</p>
<h1 id="recurrences">Recurrences</h1>
<p>Suppose that $f(x) = (1 + x)^c$ for some real $c$. Using the Maple packages
<a href="https://sites.math.rutgers.edu/~zeilberg/tokhniot/PADE">PADE</a> and <a href="https://sites.math.rutgers.edu/~zeilberg/tokhniot/FindRec.txt">FindRec</a>, I have conjectured the following recurrences<sup id="fnref:rec-note" role="doc-noteref"><a href="#fn:rec-note" class="footnote" rel="footnote">3</a></sup>
for the coefficients $a(n, m, k)$ and $b(n, m, k)$ of the $(n, m)$th Padé
approximant for $f$:</p>
\[\begin{align}
\begin{split}
a(n, m, k + 1) &= \frac{(n - k)(m + c - k)}{(k + 1) (n + m - k)} a(n, m, k) \\
b(n, m, k + 1) &= \frac{(m - k)(n - c - k)}{(k + 1)(n + m - k)} b(n, m, k)
\end{split}
\label{rec}
\tag{Recurrence}
\end{align}\]
<p>Unrolling this and shifting $k$ back by $1$ gives the following solutions:</p>
\[\begin{align*}
a(n, m, k + 1) &= a(n, m, 0) \frac{n^{\underline{k}} (m + c)^{\underline{k}}}{k! (n + m)^{\underline{k}}} \\
b(n, m, k + 1) &= b(n, m, 0) \frac{m^{\underline{k}} (n - c)^{\underline{k}}}{k! (n + m)^{\underline{k}}}.
\end{align*}\]
<p>But if we just <em>look</em> at these long enough, we see how we <em>should</em> write them:</p>
\[\begin{align}
\begin{split}
a(n, m, k) &= a(n, m, 0) {n \choose k} {m + c \choose k} {n + m \choose k}^{-1} \\
b(n, m, k) &= b(n, m, 0) {m \choose k} {n - c \choose k} {n + m \choose k}^{-1}.
\end{split}
\label{solution}
\tag{Solution}
\end{align}\]
<p>All that’s left are the two constant terms. Setting $x = 0$ in \eqref{pade}
tells us that $a(n, m, 0) = f(0) b(n, m, 0)$, and $f(x) = (1 + x)^c$ gives
$f(0) = 1$, so the constant terms are actually equal. Padé approximants are
only equal up to a constant multiple anyway, so from now on let’s just assume
that $a(n, m, 0) = b(n, m, 0) = 1$. We’ll come back to this later.</p>
<h1 id="verification">Verification</h1>
<p>If we define $a(n, m, k)$ and $b(n, m, k)$ to vanish when they should, then all
we need to check to prove our conjecture is that \eqref{convolution} holds with
$c(k) = [x^k] (1 + x)^c = {c \choose k}$. Writing this out, it is our challenge
to prove the following:</p>
\[\sum_{j = 0}^k
{m \choose j} {n - c \choose j} {c \choose k - j} {n + m \choose j}^{-1}
= {n \choose k} {m + c \choose k} {n + m \choose k}^{-1}.\]
<p>This looks quite burly, but it is <em>completely routine</em> to prove these days with
<a href="https://www.math.upenn.edu/~wilf/AeqB.html">WZ</a> theory. The proof goes like this in Maple:</p>
<pre><code class="language-maple">with(SumTools[Hypergeometric]):
a := (n, m, k, c) -> binomial(n, k) * binomial(m + c, k) / binomial(n + m, k):
b := (n, m, k, c) -> binomial(m, k) * binomial(n - c, k) / binomial(n + m, k):
d := (k, c) -> binomial(c, k):
T := b(n, m, j, c) * d(k - j, c) / a(n, m, k, c):
ZeilbergerRecurrence(T, k, j, f, 0..k);
</code></pre>
<p>This code outputs <code class="language-plaintext highlighter-rouge">-f(k) + f(k + 1) = 0</code>, where $f(k)$ is our sum in $j$. This
means, essentially, that our identity is true so long as it is true for $k
= 0$. Plugging in $k = 0$ gives the trivially true statement $1 = 1$, so we’re
good!</p>
<h1 id="the-constant-factor">The constant factor</h1>
<p>I may have been slightly misleading when I said that Padé approximants are
unique up to a constant multiple. This is <em>true</em>, but in practice packages will
choose <em>different constants</em> for each $(n, m)$th approximant, to clear
denominators or something like that. Put another way, the Padé approximates are
some function of $n$ and $m$ times the sequences in \eqref{solution}.</p>
<p>The <a href="https://sites.math.rutgers.edu/~zeilberg/tokhniot/PADE">PADE</a> package assumes that $b(n, m, 0) = 1$, but then normalizes the
approximant by clearing denominators and making everything look nice. I believe
that, for PADE, the correct “constant” is $(n + m)! / \min(n, m)$, so</p>
\[a(n, m, 0) = b(n, m, 0) = \frac{(n + m)!}{\min(n, m)!}.\]
<p>Multiplying through by $(n + m)!$ gives integer coefficients, and I guess the
$\min(n, m)!$ term is there to get rid of any common integer factors after
that.</p>
<p>This has practical ramifications to the experimental side of this problem.
There are, I think, three ways to guess the coefficients:</p>
<ol>
<li>
<p>Guess a recurrence for $a(n, m, k)$ in $k$ (read fixed polynomials from left to right);</p>
</li>
<li>
<p>Guess a recurrence for $a(n, m, k)$ in $n$ (read fixed degrees from top to bottom); and</p>
</li>
<li>
<p>Guess a recurrence for $a(n, m, n - k)$ in $n$ (read fixed <em>relative</em>
degrees from top to bottom).</p>
</li>
</ol>
<p>If every $(n, m)$ pair has a different constant, then the last two options
could be more difficult because the “normalizing constants” could be very
complicated. In other words, reading across different values of $(n, m)$ forces
you to guess the “base” sequences as well as the “normalizing” sequences. The
first option, in comparison, only needs you to guess the “base” sequence! You
know <em>a priori</em> that there <em>is</em> a normalizing sequence, and you can file that
problem away for another day.</p>
<h1 id="small-generalizations-and-special-cases">Small generalizations and special cases</h1>
<p>Given the Padé approximants of $f(x) = (1 + x)^c$, we can get the approximants
of $(1 + tx)^c$ by scaling the old coefficients by $t^k$. Thus, to be
completely general, we should write</p>
\[\begin{align*}
a(n, m, k) &= t^k {n \choose k} {m + c \choose k} {n + m \choose k}^{-1} \\
b(n, m, k) &= t^k {m \choose k} {n - c \choose k} {n + m \choose k}^{-1}.
\end{align*}\]
<p>(If you don’t believe me, just modify the above Maple code to prove it
yourself.)</p>
<p>The function $f(x) = (1 - 4x)^{-1/2}$ is of special interest since it generates
the central binomial coefficients. Let’s find the diagonal approximants (i.e.,
$n = m$) for this $f$. Our work here tells us that</p>
\[\begin{align*}
a(n, n, k) &= (-4)^k {n \choose k} {n - \frac{1}{2} \choose k} {2n \choose k}^{-1} \\
b(n, n, k) &= (-4)^k {n \choose k} {n + \frac{1}{2} \choose k} {2n \choose k}^{-1}.
\end{align*}\]
<p>By some elementary binomial coefficient identities,</p>
\[\begin{align*}
{n - \frac{1}{2} \choose k} &= \frac{ {2n \choose 2k} {2k \choose k}}{4^k {n \choose k}} \\
{n + \frac{1}{2} \choose k} &= \frac{ {2(n + 1) \choose 2k} {2k \choose k}}{4^k {n + 1 \choose k}}.
\end{align*}\]
<p>Therefore, after some simplifying,</p>
\[\begin{align*}
a(n, n, k) &= (-1)^k {2n - k \choose k} \\
b(n, n, k) &= (-1)^k \frac{2n + 1}{2(n - k) + 1} {2n - k \choose k}.
\end{align*}\]
<p>This tells us that, say, the leading coefficient in the denominator is $b(n, n,
n) = (-1)^n (2n + 1)$. This checks out:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> Pade1((1 - 4 * x)^(-1/2), x, 2, 2);
2
x - 3 x + 1
--------------
2
5 x - 5 x + 1
> Pade1((1 - 4 * x)^(-1/2), x, 3, 3);
3 2
-x + 6 x - 5 x + 1
-----------------------
3 2
-7 x + 14 x - 7 x + 1
> Pade1((1 - 4 * x)^(-1/2), x, 4, 4);
4 3 2
x - 10 x + 15 x - 7 x + 1
------------------------------
4 3 2
9 x - 30 x + 27 x - 9 x + 1
</code></pre></div></div>
<h1 id="conclusion">Conclusion</h1>
<p>This seems to conclude the story of Padé approximants for $(1 + tx)^c$. We have
exhausted all meaningful generalizations. We could consider $(a + tx)^c$, but
that’s really just the same function. No, I think that we’re done here.</p>
<p>Maybe we shouldn’t be surprised that these approximants turned out nice. The
coefficients of $(1 + x)^c$ are just binomial coefficients. Of course they
would spit out exactly the right kind of convolution identity.</p>
<p>Perhaps a more fruitful approach would be to look at known convolution
identities and work backwards to discovering the involved functions. That way
you at least know that you’ll have <em>some</em> nice Padé approximants, whatever the
function is. A problem to be taken up soon!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:pade-ref" role="doc-endnote">
<p>See Chapter 20 of Wall’s <em>Analytic Theory of Continued Fractions</em>. <a href="#fnref:pade-ref" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:constant" role="doc-endnote">
<p>If you rush off to check these with <a href="https://sites.math.rutgers.edu/~zeilberg/tokhniot/PADE">PADE</a> or something similar,
be aware that normalizing the computed Padé coefficients (e.g.,
cancelling denominators or something like that) will produce
different constants for each $(n, m)$ pair. For <a href="https://sites.math.rutgers.edu/~zeilberg/tokhniot/PADE">PADE</a> in
particular, I believe that the constant is $(n + m)! / \min(n,
m)!$. <a href="#fnref:constant" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:rec-note" role="doc-endnote">
<p>Strictly speaking, these recurrences don’t make sense for $k = n$
or $k = m$. I <em>should</em> have cleared denominators, and doing that
would show that $a(n, m, k)$ vanishes if $k > n$, and $b(n, m, k)$
vanishes if $k > m$, as we would expect. <a href="#fnref:rec-note" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Robert Dougherty-Blissrobert.w.bliss@gmail.comAn $(n, m)$th Padé approximant for a generating function $f(x)$ is a pair of polynomials $(P_{n, m}(x), Q_{n, m}(x))$ such that $P_{n, m}(x)$ and $Q_{n, m}(x)$ have degrees not exceeding $n$ and $m$, respectively, and