### Purpose

This post aims at recapturing the main ideas of the formal proofs that I’ve
read. It *never* tries to replace them. You may consult the
references if you need any of them.

### Some notations

Unless otherwise specified, *all* cardinalities here are *infinite*. Denote
$\mathfrak{a} = \card{A}$, $\mathfrak{b} = \card{B}$ and
$\mathfrak{i} = \card{I}$.

- Sum
- $\mathfrak{a} + \mathfrak{b} = \card{A \cup B}$ provided that $A \cap B =\varnothing$.
- Product
- $\mathfrak{a} \, \mathfrak{b} = \card{A \times B}$
- Power
- $\mathfrak{a}^\mathfrak{i} = \card{A^I}$, where $A^I = \lbrace f \mid f: I \to A \rbrace$ denotes the set of functions from $I$ to $A$.

I've chosen $I$ instead of $B$ to express the index set because this reminds me of an array of $(a_i)_i$ indexed by $I$.

Using these notations, it’s easy to recover usual power laws, distributive laws and inequalities. I’ll take them for granted.

### An alternate definition of infinite set

A set which admits a

propersubsetbijectiveto itself.

To see this from the usual definition (cardinality $\ge \aleph_0$), “take $\aleph_0$ elements from the set” and build a bijection that verifies the above statement (The easiest one is “to discard one element”.) Extend this bijection the easiest way: “leave outsiders untouched”.

To see the converse, instead of reasoning directly from the above statement, we
prove the contrapositive form. The reason is evident: finite sets are *much*
easier to deal with.

### Sum, product and power of infinite sets

- Sum $$ \mathfrak{a} + \mathfrak{b} = \max(\mathfrak{a}, \mathfrak{b}) \tag{$\Sigma$} $$
- Product $$ \mathfrak{a} \, \mathfrak{b} = \max(\mathfrak{a}, \mathfrak{b}) \tag{$\Pi$} $$
- Power: Given $2 \le \mathfrak{a} \le 2^\mathfrak{b}$. $$ \mathfrak{a}^\mathfrak{b} = 2^\mathfrak{b} \tag{$\land$} $$

In fact, WLOG, it suffices to show $$ \mathfrak{a} + \mathfrak{a} = \mathfrak{a} \, \mathfrak{a} = \mathfrak{a}. $$

To show $(\Sigma)$, we will factorize $\mathfrak{a} = \mathfrak{i} \, \aleph_0$, so that the second factor $\aleph_0$ can be split into two identical halves (odd and even).

The morals of $(\Sigma)$ and $(\Pi)$ is that taking union or direct product of
infinite sets of the same size *won’t* give a bigger set. Exponentiating an
infinite set is the way to get a bigger set.

#### Digression on induction

Since we know *almost nothing* about $A$ (not even countability), we have to use
Zorn’s lemma (to do “uncountable induction”).

Zorn’s lemma | Mathematical induction | |
---|---|---|

“universal set” | $\mathcal{X}$ poset | $\lbrace n \in \N \mid P(n) \rbrace$ |

examples | collection of sets, function class | (subsets of) $\N$ |

order | partial: set inclusion, extension of function | total: usual $\le$ |

base case | $\mathcal{X} \ne \varnothing$ | usually $P(1)$ |

“assumed chain” | any totally ordered subset $\mathcal{C}: j \le \cdots \le m$ | “finite chain”: $P(1) \Rightarrow \cdots \Rightarrow P(m)$ |

“inductive step” | $\mathcal{C} \ni m \le \ell \in \mathcal{X}$ | $P(m) \Rightarrow P(m+1)$ |

explanation of “inductive step” | every chain is bounded above | domino effect |

result | $\mathcal{X}$ has maximal element. | The “universal set” is (an infinite subset of) $\N$. |

This table is just a *memory aid*. It can *never* replace the statements
themselves. There’s *no* implication between the two results. A proof by
Zorn’s lemma should be routine, but I often forget the details.

Brézis’ Functional Analysis and PDE book, which starts with defining *inductive*
sets for proving Hahn–Banach Theorem, has inspired me to think of terms like
“assumed chain” and “inductive step” in order to capture the adjective
*inductive* in chains $\mathcal{C}$.

The domino effect *doesn’t* resemble an upper bound of $\mathcal{C}$. I’ve
scratched my head to think of this analogy: for *any* chain $\mathcal{C}$, “jump
outside” of it to find a “new leader” $\ell$ linked to each member $m \in
\mathcal{C}$ by either $m \le \ell$ or $P(m) \Rightarrow P(\ell)$. $\ell$ is
often found by $\cup \mathcal{C}$. (Or take union on the chain of domains in
case of function extensions.)

In usual situations, finding a maximal element $x_{\max}$ doesn't end the proof. One may need to argue that $x_{\max}$ coïncides with our desired element $x$. A proof by contradiction is often used in the case of set inclusions. For example, to prove the existence of a Hamel basis $\beta$ on any vector space, apply Zorn's lemma to get $\beta_{\max}$, and argue that any $v_0 \notin \span{\beta_{\max}}$ would contradict the maximality of $\beta_{\max}$.

Maximal *isn’t* synonymous to maximum: think of prime ideals in $\Z$.

#### Product vs power

I was stuck at this malformed question for a hour: how much is $(a_i)_i$?

$\color{red}{A} \backslash \color{blue}{I}$ | $\color{blue}{1}$ | $\color{blue}{2}$ |
---|---|---|

$\color{red}{1}$ | $\color{red}{1}$ | $\color{red}{1}$ |

$\color{red}{2}$ | $\color{red}{2}$ | $\color{red}{2}$ |

$\color{red}{3}$ | $\color{red}{3}$ | $\color{red}{3}$ |

If we see $(\color{red}{a}_{\color{blue}{i}})_\color{blue}{i}$ as a function, then there's $|\color{red}{A}|^{|\color{blue}{I}|}$ many $(\color{red}{a}_{\color{blue}{i}})_\color{blue}{i}$'s.

What if we enlarge the index so that it becomes $(\color{red}{a}, \color{blue}{i})$?

To make sense of $|\color{red}{A} \times \color{blue}{I}|$, you may identify (identical copies of) $\color{red}{A}_{\color{blue}{i}}$ with $\color{blue}{i}$. (Say, $\color{red}{A}_{\color{blue}{1}} = \color{red}{A}_{\color{blue}{2}} = \lbrace \color{red}{1},\color{red}{2},\color{red}{3} \rbrace$, but we treat $\color{red}{2} \in \color{red}{A}_{\color{blue}{1}}$ and $\color{red}{2} \in \color{red}{A}_{\color{blue}{2}}$ differently by adjoining an index $\color{blue}{i}$, so that they become $(\color{red}{2},\color{blue}{1})$ and $(\color{red}{2},\color{blue}{2})$.)

In the first case, we write $\color{red}{A}^{\color{blue}{I}} = \prod_{\color{blue}{i} \in \color{blue}{I}} \color{red}{A}_{\color{blue}{i}}$; in the second case, we write $\color{red}{A} \times \color{blue}{I} = \sqcup_{\color{blue}{i} \in \color{blue}{I}} (\color{red}{A}_{\color{blue}{i}}, \color{blue}{i})$.

#### Return to the proofs

$A_i$ | $\cdots$ | $A_j$ | $\lbrace a_1, \dots, a_n \rbrace$ |
---|---|---|---|

Partitions of | cardinality $\aleph_0$ | indexed by $i$ | finitely many elements left |

To factorize $\mathfrak{a} = \mathfrak{i} \, \aleph_0$, we need partitions
$A_i$’s of “size” $\aleph_0$. After taking “as much $A_i$’s as possible”, only
finitely many elements $a_1, \dots, a_n$ are left. Put them into any one
partition $A_i$—this *won’t* affect its cardinality. Zorn’s lemma’s role in
proving $(\Sigma)$ is to formalize the action of “taking as much $A_i$’s as
possible” by creating a *maximal* collection of such sets. The “universal set”
is the *collection* of partitions of cardinality $\aleph_0$, and the partial
order is the usual set inclusion between the *collections* rather than the
partitions.

Identity $(\Pi)$ is much more difficult to prove. My attempts to decompose $\mathfrak{a}$ into powers involving $\aleph_0$ have failed. Zorn’s lemma is used in this way:

- $\mathcal{X} = \lbrace f: C \to C^2 \mid f \mathinner{\text{bijective}}, C \subseteq A \rbrace$
- $\le$: function extension in $\mathcal{X}$
- base case: $C = \N$, $f$ is a counting funciton for $\Q$
- chain: $\lbrace (D_i, f_i) \rbrace_i$
- upper bound: $(D, f)$, where $D = \cup_i D_i$ and $f: D \to D^2$ defined by $f(d_i) = f_i(d_i)$. Easy to verify that it’s bijective.
- maximal element: by abuse of notation, we keep using $(D, f)$.
- Claim $A = D$: the most difficult part

Let $E = A \setminus D$. Write $\mathfrak{a} = \mathfrak{d} + \mathfrak{e}$.

$D^2$ | $D \times E$ |
---|---|

$E \times D$ | $E^2$ |

From $(\Sigma)$, assume $\mathfrak{a} = \mathfrak{e} > \mathfrak{d}$. This allows us to extract $F$ of cardinality $\mathfrak{d}$ from $E$. We’re going to extend $(D, f)$ to get a contradiction.

$D^2$ | $D \times F$ |
---|---|

$F \times D$ | $F^2$ |

The top-left grid represents $(D, f)$ and $\mathfrak{d}^2 = \mathfrak{d}$. In
words, the square can be “flattened into a line segment”. Note that “*all* of
these four grids are of the *same* size. From $(\Sigma)$, the remaining three
grids together are as big as one grid, which can be “flattened into $F$”. This
extends $(D, f)$, contradicting its maximality in $\mathcal{X}$.

As a result, the assumption $\mathfrak{a} > \mathfrak{d}$ is *wrong*. The
equality $\mathfrak{a} = \mathfrak{d} + \mathfrak{e}$ gives
$\mathfrak{a} = \mathfrak{d}$. This proves $(\Pi)$.

With the usual power laws ported onto cardinal numbers, it’s easy to show

### Cardinals under total order

This has motivated me to find out more about cardinal numbers. The next result
shows that the *sizes* of two sets can always be compared.

To see this, we have to apply Zorn’s lemma again.

- $\mathcal{X} = \lbrace f: X \to Y \mathinner{\text{bijective}} \mid X \subseteq A \land Y \subseteq B \rbrace$
- $\prec$: usual function extension (extend domain and codomain, keep original values unchanged).
- Start from any chain $\mathcal{C}$. Take $\cup \mathcal{C}$ and show that it’s in $\mathcal{X}$.
- By Zorn’s lemma, there exists a maximal element in $\mathcal{X}$. Denote this again by $f: C \to D$.
Define $E$ and $F$ to be

$A = C \sqcup E$ $B = D \sqcup F$ max $C \xleftarrow[\text{bijective with}]{}$ $\xrightarrow[\text{each other}]{}D$ rest $E := A \setminus C$ $F:= B \setminus D$ If $E = \varnothing$ or $F = \varnothing$, we’re done. Otherwise, maximality of $f: C \to D$ would be contradicted.

### Bijection from $\mathcal{P}(\N^*)$ to real line

Denote $\mathfrak{c} = \card{[0,1]}$. Let’s compare $\aleph_0$ and $\mathfrak{c}$ before proving a cardinality equality.

#### Digression: Cantor’s diagonal extraction

The aim of this subsection is to show that $\mathfrak{c} > \aleph_0$.

To show that $[0,1]$ is *uncountable*, a classic counter-argument proceeds as
follows:

Suppose $[0,1]$ is countable, and denote it by $(r_n)_n$. Extract a number
*not* in the list $(r_n)_n$ by a function

The number
$$
d = \sum_{n \in \N^*} \frac{\color{blue}\phi(r_n)}{10^n}
$$
converges, so it’s well-defined. However, $d$ *doesn’t* belong to list $(r_n)$
since the $n$-th decimal place of $d$ ($\phi(r_n)$)
*never matches*
$\colorbox{yellow}{\color{red}{\text{the $n$-th decimal place of $r_n$}}}$.

#### Diagram illustrating the bijections

The first bijection *isn’t* evident as $\frac12 = \sum_{n\ge2} \frac{1}{2^n}$.
Therefore, a *direct proof won’t work*. We have to calculate cardinalities and
establish injections between some infinite subset and use CSB theorem. We have
to separate two cases

- $T_1 = \lbrace (a_n)_n \mid a_n = 1 \mathinner{\text{i.o.}} \rbrace$ (“i.o.” means “infinitely often”. This is used in Durrett’s Probability Theory book.)
- $T \setminus T_1 = \lbrace (a_n)_n \mid a_n = 0 \mathinner{\text{eventually}} \rbrace \subseteq \Q$

We have $\card{T \setminus T_1} \le \aleph_0$. To see that's indeed an equality, take $\lbrace (\delta_{ij}) \rbrace_j$. $(\Sigma)$ and Cantor's Theorem forces $\card{T_1} = 2^{\aleph_0}$. To show a cardinality inequality, we need an injection. Since $\mathfrak{c} > \aleph_0$, we can't inject $[0,1]$ into $T \setminus T_1$.

#### Surjectivity of $\phi$

To show $\phi$ is surjective, construct $(a_n)_n$ from any $r \in [0,1]$ recursively as follows:

- $i = 1$: If $r \ge \frac12$, $a_1 = 1$. Otherwise, $a_1 = 0$. The error for the first iteration is $\epsilon_1 \leftarrow r - \frac{a_1}{2}$.
$i > 1$: If $\epsilon_{i-1} \ge \frac{1}{2^i}$, $a_i = 1$. Otherwise, $a_i = 0$. The error for the $i$-th iteration is $\epsilon_i \leftarrow r_{i-1} - \frac{a_i}{2^i}$ It’s clear that

$$ \begin{aligned} \mathinner{\text{error}} \epsilon_n &= \mathinner{\text{real value}} r - \mathinner{\text{lower approximation}} \sum_{i = 1}^n \frac{a_i}{2^i}. \end{aligned} $$

Show that $(a_n)_n$ constructed from $r$ can be mapped back to $r$ by $\phi$. (i.e. $\phi((a_n)_n) = r$) This can be proved directly and inductive from the construction of $(a_n)_n$. For each iteration, the increasing partial sum is near enough to $r$.

- base case: direct calculations
- induction hypothesis: assume $\mathrm{P}(m)$
inductive step $i = m + 1$:

$$ \begin{aligned} \epsilon_{m+1} &= \epsilon_m -\frac{a_{m+1}}{2^{m+1}} \\ &\le \begin{cases} \frac{1}{2^{m+1}} - 0 &\text{if } \epsilon_m < \frac{1}{2^{m+1}} \\ \frac{1}{2^m} - \frac{1}{2^{m+1}} &\text{if } \epsilon_m \ge \frac{1}{2^{m+1}} \end{cases} \\ &= \frac{1}{2^{m+1}} \end{aligned} \tag{$\star$} $$For the second case in $(\star)$, $\mathrm{P}(m)$ is applied.

As a result, error $\epsilon_n \to 0$ as $n \to \infty$.

Translate the surjectivity of $\phi$ into the cardinality inequality

$(\text{AC})$ means the Axiom of Choice.

#### Injectivity of $\phi|_{T_1}$

Suppose $(a_n)_n, (b_n)_n \in T_1$ such that $\phi((a_n)_n) = \phi((b_n)_n) = r$. If $a_n = b_n \forall n \in \N^*$, then $(a_n)_n = (b_n)_n$ and we’re done. Otherwise, they differ starting from the $n_0$-th term.

WLOG, assume $a_{n_0} = 1$ and $b_{n_0} = 0$. We have

Two seq in $T_1$ | head: $n < n_0$ | they differ from here | tail: $n > n_0$ |
---|---|---|---|

$(a_n)_n$ | same | $a_{n_0} = 1$ | infinitely many $1$’s |

$(b_n)_n$ | terms | $b_{n_0} = 0$ | dominated by $a_{n_0} = 1$ |

The key step in $\text{(I)}$ is the use of $a_n = 1$ i.o. to get a *strict*
inequality.

The injectivity of $\phi|_{T_1}$ gives

Combine $(\subseteq)$ and $(\supseteq)$ and apply CSB theorem to get $\mathfrak{c} = 2^{\aleph_0}$.

### Reference

Gabriel Nagy’s *Real Analysis* (version 2.0) Appendix B

I’ve *modified* his proof for $\mathfrak{c} = 2^{\aleph_0}$ by changing $T_0$
into $T_1$. This leads automatically to $\card{T \subseteq T_1} = \aleph_0$.
To avoid going too deep, I simply take $r = 2$.