Purpose
This post aims at recapturing the main ideas of the formal proofs that I’ve read. It never tries to replace them. You may consult the references if you need any of them.
Some notations
Unless otherwise specified, all cardinalities here are infinite. Denote $\mathfrak{a} = \card{A}$, $\mathfrak{b} = \card{B}$ and $\mathfrak{i} = \card{I}$.
 Sum
 $\mathfrak{a} + \mathfrak{b} = \card{A \cup B}$ provided that $A \cap B =\varnothing$.
 Product
 $\mathfrak{a} , \mathfrak{b} = \card{A \times B}$
 Power
 $\mathfrak{a}^\mathfrak{i} = \card{A^I}$, where $A^I = \lbrace f \mid f: I \to A \rbrace$ denotes the set of functions from $I$ to $A$.
Using these notations, it’s easy to recover usual power laws, distributive laws and inequalities. I’ll take them for granted.
An alternate definition of infinite set
A set which admits a proper subset bijective to itself.
To see this from the usual definition (cardinality $\ge \aleph_0$), “take $\aleph_0$ elements from the set” and build a bijection that verifies the above statement (The easiest one is “to discard one element”.) Extend this bijection the easiest way: “leave outsiders untouched”.
To see the converse, instead of reasoning directly from the above statement, we prove the contrapositive form. The reason is evident: finite sets are much easier to deal with.
Sum, product and power of infinite sets
 Sum $$ \mathfrak{a} + \mathfrak{b} = \max(\mathfrak{a}, \mathfrak{b}) \tag{$\Sigma$} $$
 Product $$ \mathfrak{a} , \mathfrak{b} = \max(\mathfrak{a}, \mathfrak{b}) \tag{$\Pi$} $$
 Power: Given $2 \le \mathfrak{a} \le 2^\mathfrak{b}$. $$ \mathfrak{a}^\mathfrak{b} = 2^\mathfrak{b} \tag{$\land$} $$
In fact, WLOG, it suffices to show $$ \mathfrak{a} + \mathfrak{a} = \mathfrak{a} , \mathfrak{a} = \mathfrak{a}. $$
To show $(\Sigma)$, we will factorize $\mathfrak{a} = \mathfrak{i} , \aleph_0$, so that the second factor $\aleph_0$ can be split into two identical halves (odd and even).
The morals of $(\Sigma)$ and $(\Pi)$ is that taking union or direct product of infinite sets of the same size won’t give a bigger set. Exponentiating an infinite set is the way to get a bigger set.
Digression on induction
Since we know almost nothing about $A$ (not even countability), we have to use Zorn’s lemma (to do “uncountable induction”).
Zorn’s lemma  Mathematical induction  

“universal set”  $\mathcal{X}$ poset  $\lbrace n \in \N \mid P(n) \rbrace$ 
examples  collection of sets, function class  (subsets of) $\N$ 
order  partial: set inclusion, extension of function  total: usual $\le$ 
base case  $\mathcal{X} \ne \varnothing$  usually $P(1)$ 
“assumed chain”  any totally ordered subset $\mathcal{C}: j \le \cdots \le m$  “finite chain”: $P(1) \Rightarrow \cdots \Rightarrow P(m)$ 
“inductive step”  $\mathcal{C} \ni m \le \ell \in \mathcal{X}$  $P(m) \Rightarrow P(m+1)$ 
explanation of “inductive step”  every chain is bounded above  domino effect 
result  $\mathcal{X}$ has maximal element.  The “universal set” is (an infinite subset of) $\N$. 
This table is just a memory aid. It can never replace the statements themselves. There’s no implication between the two results. A proof by Zorn’s lemma should be routine, but I often forget the details.
Brézis’ Functional Analysis and PDE book, which starts with defining inductive sets for proving Hahn–Banach Theorem, has inspired me to think of terms like “assumed chain” and “inductive step” in order to capture the adjective inductive in chains $\mathcal{C}$.
The domino effect doesn’t resemble an upper bound of $\mathcal{C}$. I’ve scratched my head to think of this analogy: for any chain $\mathcal{C}$, “jump outside” of it to find a “new leader” $\ell$ linked to each member $m \in \mathcal{C}$ by either $m \le \ell$ or $P(m) \Rightarrow P(\ell)$. $\ell$ is often found by $\cup \mathcal{C}$. (Or take union on the chain of domains in case of function extensions.)
Maximal isn’t synonymous to maximum: think of prime ideals in $\Z$.
Product vs power
I was stuck at this malformed question for a hour: how much is $(a_i)_i$?
$\color{red}{A} \backslash \color{blue}{I}$  $\color{blue}{1}$  $\color{blue}{2}$ 

$\color{red}{1}$  $\color{red}{1}$  $\color{red}{1}$ 
$\color{red}{2}$  $\color{red}{2}$  $\color{red}{2}$ 
$\color{red}{3}$  $\color{red}{3}$  $\color{red}{3}$ 
Return to the proofs
$A_i$  $\cdots$  $A_j$  $\lbrace a_1, \dots, a_n \rbrace$ 

Partitions of  cardinality $\aleph_0$  indexed by $i$  finitely many elements left 
To factorize $\mathfrak{a} = \mathfrak{i} , \aleph_0$, we need partitions $A_i$'s of “size” $\aleph_0$. After taking “as much $A_i$'s as possible”, only finitely many elements $a_1, \dots, a_n$ are left. Put them into any one partition $A_i$—this won’t affect its cardinality. Zorn’s lemma’s role in proving $(\Sigma)$ is to formalize the action of “taking as much $A_i$'s as possible” by creating a maximal collection of such sets. The “universal set” is the collection of partitions of cardinality $\aleph_0$, and the partial order is the usual set inclusion between the collections rather than the partitions.
Identity $(\Pi)$ is much more difficult to prove. My attempts to decompose $\mathfrak{a}$ into powers involving $\aleph_0$ have failed. Zorn’s lemma is used in this way:
 $\mathcal{X} = \lbrace f: C \to C^2 \mid f \mathinner{\text{bijective}}, C \subseteq A \rbrace$
 $\le$: function extension in $\mathcal{X}$
 base case: $C = \N$, $f$ is a counting funciton for $\Q$
 chain: $\lbrace (D_i, f_i) \rbrace_i$
 upper bound: $(D, f)$, where $D = \cup_i D_i$ and $f: D \to D^2$ defined by $f(d_i) = f_i(d_i)$. Easy to verify that it’s bijective.
 maximal element: by abuse of notation, we keep using $(D, f)$.
 Claim $A = D$: the most difficult part
Let $E = A \setminus D$. Write $\mathfrak{a} = \mathfrak{d} + \mathfrak{e}$.
$D^2$  $D \times E$ 

$E \times D$  $E^2$ 
From $(\Sigma)$, assume $\mathfrak{a} = \mathfrak{e} > \mathfrak{d}$. This allows us to extract $F$ of cardinality $\mathfrak{d}$ from $E$. We’re going to extend $(D, f)$ to get a contradiction.
$D^2$  $D \times F$ 

$F \times D$  $F^2$ 
The topleft grid represents $(D, f)$ and $\mathfrak{d}^2 = \mathfrak{d}$. In words, the square can be “flattened into a line segment”. Note that “all of these four grids are of the same size. From $(\Sigma)$, the remaining three grids together are as big as one grid, which can be “flattened into $F$”. This extends $(D, f)$, contradicting its maximality in $\mathcal{X}$.
As a result, the assumption $\mathfrak{a} > \mathfrak{d}$ is wrong. The equality $\mathfrak{a} = \mathfrak{d} + \mathfrak{e}$ gives $\mathfrak{a} = \mathfrak{d}$. This proves $(\Pi)$.
With the usual power laws ported onto cardinal numbers, it’s easy to show
Cardinals under total order
This has motivated me to find out more about cardinal numbers. The next result shows that the sizes of two sets can always be compared.
To see this, we have to apply Zorn’s lemma again.

$\mathcal{X} = \lbrace f: X \to Y \mathinner{\text{bijective}} \mid X \subseteq A \land Y \subseteq B \rbrace$

$\prec$: usual function extension (extend domain and codomain, keep original values unchanged).

Start from any chain $\mathcal{C}$. Take $\cup \mathcal{C}$ and show that it’s in $\mathcal{X}$.

By Zorn’s lemma, there exists a maximal element in $\mathcal{X}$. Denote this again by $f: C \to D$.

Define $E$ and $F$ to be
$A = C \sqcup E$ $B = D \sqcup F$ max $C \xleftarrow[\text{bijective with}]{}$ $\xrightarrow[\text{each other}]{}D$ rest $E := A \setminus C$ $F:= B \setminus D$ 
If $E = \varnothing$ or $F = \varnothing$, we’re done. Otherwise, maximality of $f: C \to D$ would be contradicted.
Bijection from $\mathcal{P}(\N^*)$ to real line
Denote $\mathfrak{c} = \card{[0,1]}$. Let’s compare $\aleph_0$ and $\mathfrak{c}$ before proving a cardinality equality.
Digression: Cantor’s diagonal extraction
The aim of this subsection is to show that $\mathfrak{c} > \aleph_0$.
To show that $[0,1]$ is uncountable, a classic counterargument proceeds as follows:
Suppose $[0,1]$ is countable, and denote it by $(r_n)_n$. Extract a number not in the list $(r_n)_n$ by a function
The number $$ d = \sum_{n \in \N^*} \frac{\color{blue}\phi(r_n)}{10^n} $$ converges, so it’s welldefined. However, $d$ _doesn’t_ belong to list $(r_n)$ since the $n$th decimal place of $d$ ($\phi(r_n)$) _never matches_ $\colorbox{yellow}{\color{red}{\text{the $n$th decimal place of $r_n$}}}$.
Diagram illustrating the bijections
The first bijection isn’t evident as $\frac12 = \sum_{n\ge2} \frac{1}{2^n}$. Therefore, a _direct proof won’t work_. We have to calculate cardinalities and establish injections between some infinite subset and use CSB theorem. We have to separate two cases
 $T_1 = \lbrace (a_n)_n \mid a_n = 1 \mathinner{\text{i.o.}} \rbrace$ (“i.o.” means “infinitely often”. This is used in Durrett’s Probability Theory book.)
 $T \setminus T_1 = \lbrace (a_n)_n \mid a_n = 0 \mathinner{\text{eventually}} \rbrace \subseteq \Q$
Surjectivity of $\phi$
To show $\phi$ is surjective, construct $(a_n)_n$ from any $r \in [0,1]$ recursively as follows:

$i = 1$: If $r \ge \frac12$, $a_1 = 1$. Otherwise, $a_1 = 0$. The error for the first iteration is $\epsilon_1 \leftarrow r  \frac{a_1}{2}$.

$i > 1$: If $\epsilon_{i1} \ge \frac{1}{2^i}$, $a_i = 1$. Otherwise, $a_i = 0$. The error for the $i$th iteration is $\epsilon_i \leftarrow r_{i1}  \frac{a_i}{2^i}$ It’s clear that
Show that $(a_n)_n$ constructed from $r$ can be mapped back to $r$ by $\phi$. (i.e. $\phi((a_n)_n) = r$) This can be proved directly and inductive from the construction of $(a_n)_n$. For each iteration, the increasing partial sum is near enough to $r$.

base case: direct calculations

induction hypothesis: assume $\mathrm{P}(m)$

inductive step $i = m + 1$:
For the second case in $(\star)$, $\mathrm{P}(m)$ is applied.
As a result, error $\epsilon_n \to 0$ as $n \to \infty$.
Translate the surjectivity of $\phi$ into the cardinality inequality
$(\text{AC})$ means the Axiom of Choice.
Injectivity of $\phi_{T_1}$
Suppose $(a_n)_n, (b_n)_n \in T_1$ such that $\phi((a_n)_n) = \phi((b_n)_n) = r$. If $a_n = b_n \forall n \in \N^*$, then $(a_n)_n = (b_n)_n$ and we’re done. Otherwise, they differ starting from the $n_0$th term.
Two seq in $T_1$  head: $n < n_0$  they differ from here  tail: $n > n_0$ 

$(a_n)_n$  same  $a_{n_0} = 1$  infinitely many $1$'s 
$(b_n)_n$  terms  $b_{n_0} = 0$  dominated by $a_{n_0} = 1$ 
The key step in $\text{(I)}$ is the use of $a_n = 1$ i.o. to get a strict inequality.
The injectivity of $\phi_{T_1}$ gives
Combine $(\subseteq)$ and $(\supseteq)$ and apply CSB theorem to get $\mathfrak{c} = 2^{\aleph_0}$.
Reference
Gabriel Nagy’s Real Analysis (version 2.0) Appendix B
I’ve modified his proof for $\mathfrak{c} = 2^{\aleph_0}$ by changing $T_0$ into $T_1$. This leads automatically to $\card{T \subseteq T_1} = \aleph_0$. To avoid going too deep, I simply take $r = 2$.