# 强撸MIT之18.06

（下面的前五章是对着第四版（前五章跟第五版差别不大）写的，后七章是对着第五版写的）

# 1 Introduction to Vectors

## 1.1 Vectors and Linear combinations

• the basic operations on vectors are multiplication cv and vector addition v + w.
• The sum of cv and dw is a linear combination of v and w.
• there is a perfect match between the column vector and the arrow from the origin and the point where the arrow ends.

## 1.2 Lengths and Dot Products

• The dot product $$v \cdot w$$ multiplies each component $$v_i$$ by $$w_i$$ and adds all $$v_iw_i$$.
• The length $$\|v\|$$ of a vector is the square root of $$v \cdot v$$.
• Unit vectors u and U at angle $$\theta$$ have $$u \cdot U = cos \theta$$.
• Schwarz inequality: $$|v \cdot w| \le \|v\| \|w\|$$.
• triangle inequality: $$\|v + w\| \le \|v\| + \|w\|$$.

## 1.3 Matrices

• Ax = b:
The matrix A acts on the vector x. The output Ax is a combination b of the columns of A. (in column’s view)
Ax is also dot products with rows. (in row’s view)
• The relationship between Difference Matrix and Derivative.
The Sum Matrix is the inverse matrix of the Difference Matrix.
the Integration is the inverse of the Differentiation.
• Independent columns: Ax = 0 has one solution. A is an invertible matrix.
Dependent columns: Cx = 0 has many solutions. C is a singular matrix.

# 2 Solving Linear Equations

## 2.1 Vectors and Linear Equations

• Column picture: Ax = b asks for a combination of columns to produce b.
• Row picture: Each equation in Ax = 0 gives a line (n = 2) or a plane (n = 3) or a “hyperplane” (n > 3).
They intersect at the solution or solutions, if any.
• but Ax should be understood as a combination of the columns of A.

## 2.2 The idea of Elimination

• A linear system (Ax = b) becomes upper triangular (Ux = c) after elimination.
• The upper triangular system is solved by back substitution (starting at the bottom).
• A zero in the pivot position can be repaired if there is a nonzero below it.
• When breakdown is permanent, the system has no solution or infinitely many.

## 2.3 Elimination Using Matrices

• The Identity Matrix has 1’s on the diagonal and otherwise 0’s.
Then $$Ib = b$$ for all b.
• The Elementary Matrix or Elimination Matrix $$E_{ij}$$ that substracts a multiple $$l$$ of row j from row i has the extra nonzero entry $$-l$$ in the i,j position (sitll diagonal 1’s)
• The Row Exchange Matrix $$P_{ij}$$ is the identity matrix with rows i and j reversed.
When this Permutation Matrix $$P_{ij}$$ multiplies a matrix, it exchanges rows i and j.
• Elimination does the same row operations to A and to b. We can include b as an extra column and follow it through elimination. The matrix A is enlarged or augmented by the extra column b.

## 2.4 Rules for Matrix Operations

• The (i, j) entry of AB is (row i of A) $$\cdot$$ (column j of B).
AB is also the sum of these matrices: (column j of A) times (row j of B).
• A row times a column is an inner product.
A column times a row is an outer product.
These are extreme cases of matrix multiplication.
• Each column of AB is a combination of the columns of A. (matrix B acts from right)
Each row of AB is a combination of the rows of B. (matrix A acts from left)
• Block multiplication is allowed when the block shapes match correctly.
Block elimination produces the Schur complement $$D – CA^{-1}B$$.
• The associative law for ABC is satisfied.
The commutative law is usually broken.

## 2.5 Inverse Matrices

• The inverse matrix gives $$AA^{-1} = I$$ and $$A^{-1}A = I$$
• The algorithm to test invertibility is elimination : A is invertible if and only if it has n pivots (row exchanges allowed).
The algebra test for invertibility is the determinant of A: det A must not be zero.
The equation that tests for invertibility is Ax = 0: x = 0 must be the only solution.
• The inverse of $$AB$$ is the reverse product $$B^{-1}A^{-1}$$.
• The Gauss-Jordan method solves $$AA^{-1} = I$$ to find the n columns of $$A^{-1}$$.
The augmented matrix $$[A I]$$ is row-reduced to $$[I A^{-1}]$$.
• If A is symmetric or triangular, so is $$A^{-1}$$.
• Diagonally dominant matrices are invertible. Each $$|a_{ii}|$$ dominates its row.

## 2.6 Elimination = Factorization: A = LU

• Gaussian elimination (with no row exchanges) factors A into L times U.
• The lower triangular L contains the multiplier $$l_{ij}$$ that multipy pivot rows, while going from A to U.
The product LU add those rows back to recover A.
• The original system Ax = b is factored into two triangular systems:
solve Lc = b (forward) and Ux = c (backward)

## 2.7 Transposes and Permutations

• The transpose puts the rows of A into the columns of $$A^T$$. Then $$(A^T)_{ij} = A_{ji}$$.
• The transpose of $$AB$$ is $$B^TA^T$$. The transpose of $$A^{-1}$$ is the inverse of $$A^T$$.
• The dot product is $$x \cdot y = x^Ty$$. Then $$(Ax)^Ty$$ equals the dot product $$x^T(A^Ty)$$.
• When A is symmetric $$(A^T = A)$$, its LDU factorization is symmetric: $$A = LDL^T$$.
• A Permutation Matrix P has the rows of the identity $$I$$ in any order, and $$P^T = P^{-1}$$.
• If A is invertible then a Permutation P will reorder its rows for $$PA = LU$$.

# 3 Vector Spaces and Subspaces

## 3.1 Spaces of Vectors

• $$R^n$$ contains all column vectors with n real components.
• A vector space must closed under linear combination.
A subspace containing v and w must contain all their combinations $$cv + dw$$.
• The combinations of the columns of A form the column space $$C(A)$$. Then the column space is spanned by the columns.
• $$Ax = b$$ has a solution exactly when b is in the column space of A.

## 3.2 The Nullspace of A: solving $$Ax = 0$$

• The nullspace $$N(A)$$ is a subspace of $$R^n$$. It contains all solutions to $$Ax = 0$$.
• Elimination produces an echelon matrix U, and then a row reduced R, with pivot columns and free columns.
Every free column of U or R leads to a special solution.
The complete solution to $$Ax = 0$$ is a combination of the special solutions.
• Suppose $$Ax = 0$$ has more unknowns than equations (n > m, more columns than rows).
Then there are nonzero solutions. There must be free columns.

## 3.3 The Rank and the Row Reduced Form

• The rank r of A is the number of pivots.
The true size of A is given by its rank.
• The r pivot columns are not combinations of earlier columns.
The n – r free columns are combinations of earlier columns (pivot columns).
• Those combinations (using -F taken from R) give the n – r special solutions to $$Ax = 0$$ and $$Rx = 0$$.
They are the n – r columns of the nullspace matrix N.

## 3.4 The Complete Solution to $$Ax = b$$

• $$Ax = b$$ is solvable if and only if the last m – r equations reduce to 0 = 0.
• One particular solution $$x_p$$ has all free variables equal to zero.
The pivot variables are determined after the free variables are chosen.
• $$x = x_p + x_n$$
The particular solution $$x_p$$ will be one point on the “line”.
Adding the nullspace vector $$x_n$$ will move us along the “line”.
• Full column rank r = n means no free variables: one solution is m = n or none if m > n (overdetermined).
• Full row rank r = m means one solution if m = n or infinitely many if m < n (underdetermined).

## 3.5 Independence, Basis and Dimension

• The columns of A are independent if x = 0 if the only solution to $$Ax = 0$$.
• The vectors $$v_1, \dots ,v_r$$ span a space if their combinations fill that space.
• A basis consists of linearly independent vectors that span the space.
Every vector in the space is a unique combination of the basis vectors.
• All bases for a space have the same number of vectors.
This number of vectors in a basis is the dimension of the space.
• The pivot columns are one basis for the column space. The dimension is r.
• The vectors $$v_1, \dots ,v_n$$ are a basis for $$R^n$$ exactly when they are the columns of an n by n invertible matrix.
Thus $$R^n$$ has infinitely many different bases.
• Space Z contains only the zero vector.
The dimension of this space is zero.
The empty set is a basis for Z (We never allow the zero vector into a basis).

## 3.6 Dimension of the Four Subspaces

• Fundamental Theorem of Linear Algebra, Part 1:
The column space $$C(A)$$ and the row space $$C(A^T)$$ both have dimension r (the rank of A).
The nullspace $$N(A)$$ has dimension n − r. The left nullspace $$N(A^T)$$ has dimension m − r.
• Elimination produces bases for the row space and nullspace of A: They are the same as for R.
• Elimination often changes the column space and left nullspace (but dimensions don’t change).
• Rank one matrices: $$A = uv^T = \text{column times row}$$: $$C(A)$$ has basis u, $$C(A^T)$$ has basis v.

# 4 Orthogonality

## 4.1 Orthogonality of the Four Subspaces

• Fundamental Theorem of Linear Algebra, Part 2:
$$N(A)$$ is the orthogonal complement of the row space $$C(A^T)$$ (in $$R^n$$).
$$N(A^T)$$ is the orthogonal complement of the column space $$C(A)$$ (in $$R^m$$).
• Subspace V and W are orthogonal if every v in V is orthogonal to every w in W.
• V and W are orthogonal complements if W contains all vectors perpendicular to V (and vice versa).
Inside $$R^n$$, the dimensions of complements V and W add to n.
Orthogonality is impossible when dim V + dim W > dimension of whole space.
• When a vector is in two orthogonal subspaces, it must be zero which is perpendicular to itself.
• Any n independent vectors in $$R^n$$ will span $$R^n$$.
• Every x in $$R^n$$ has a nullspace component $$x_n$$ and a row space component $$x_r$$.
• Every vector b in column space comes from one and only one vector in the row space.
From the row space to the column space, A is invertible (pseudoinverse).

## 4.2 Projections

• The rank one projection matrix $$P = aa^T / a^Ta$$ multiplies b to produce the projection p onto the line through a.
• Projecting b onto a subspace leaves $$e = b – p$$ perpendicular to subspace.
When $$P$$ projects onto subspace, $$I – P$$ projects onto the perpendicular subspace.
• To project any b onto the column space of any matrix A
The projection p is the combination $$A\hat{x}$$ which is closest to b
The error vector $$b – A\hat{x}$$ is perpendicular to the column space of A
Therefore $$b – A\hat{x}$$ is in the nullspace of $$A^T$$. This means $$A^T(b – A\hat{x}) = 0$$.

When A has full rank n, the equation $$A^TA\hat{x} = A^Tb$$ leads to $$\hat{x}$$ and $$p = A\hat{x}$$.
• The projection matrix $$P = A(A^TA)^{-1}A^T$$ has $$P^T = P$$ and $$P^2 = P$$.
The matrix A is rectangular, which means that A may be singular.
When A is singular, we cannot split $$(A^TA)^{-1}$$ into $$A^{-1}$$ times $$(A^T)^{-1}$$ because there is no $$A^{-1}$$ in the first place.
When A has independent columns, $$A^TA$$ is square, symmetric, and invertible.

## 4.3 Least Squares Approximations

• When $$Ax = b = p + e$$ is impossible, $$A\hat{x} = p$$ is solvable.
b is outside the column space, and the nearest point in the column space is its projection p when the error e is orthogonal to the column space.
• The best $$\hat{x}$$ comes from the normal equations $$A^TA\hat{x} = A^Tb$$.
The best $$\hat{x}$$ minimizes $$E = \|Ax – b\|^2$$ which is the sum of squares of the errors in the m equations (m > n).
• If we try to fit m points by a combination of n < m functions, the m equations $$Ax = b$$ are generally unsolvable.
The n equations $$A^TA\hat{x} = A^Tb$$ give the least squares solution—— the combination with smallest MSE (mean square error).

## 4.4 Orthogonal Bases and Gram-Schmidt

• A matrix Q with orthonormal columns satisfies $$Q^TQ = I$$
When Q is square (an orthogonal matrix), $$Q^TQ = I$$ means that $$Q^T = Q^{-1}$$: transpose = inverse.
• Q preserves length: $$\|Qx\| = \|x\|$$
Q preserves dot products: $$(Qx)^T(Qy) = x^Ty$$
• The projection onto the column space is $$P = QQ^T$$
if Q is square then $$P = I$$ and every $$b = q_1(q_1^Tb) + \dots + q_n(q_n^Tb)$$.
• Gram-Schmidt process: subtract from every new vector its projections in the directions already set
In matrix form this is the factorization $$A = QR$$.
R is upper triangular because of the non-involvement of later vectors (the key point of Gram-Schmidt)
• The least squares solution of $$Qx = b$$ is $$\hat{x} = Q^Tb$$
The least squares solution of $$Ax = b$$ is $$\hat{x} = R^{-1}Q^Tb$$

# 5 Determinants

## 5.1 The Properties of Determinants

• The determinant is defined by det I = 1, sign reversal, and linearity in each row.
• After elimination det A is $$\pm$$ (product of the pivots).
• The determinant is zero exactly when A is not invertible.
• Two remarkable properties are det AB = (det A)(det B) and det $$a^T$$ = det A.

## 5.2 Permutations and Cofactors

• The first k pivots come from the k by k matrix $$A_k$$ in the top left conrner of A.
The determinant of that corner submatrix $$A_k$$ is $$d_1d_2 \dots d_k$$.
• det A = sum over all n! column permutations $$P = (\alpha, \beta, \dots, \omega)$$ = $$\Sigma(\text{det P})a_{1\alpha}a_{2\beta}\dots a_{n\omega}$$ = BIG FORMULA
• The cofactor $$C_{ij}$$ is $$(-1)^{i+j}$$ times the smaller determinant that omits row i and column j (because $$a_{ij}$$ uses that row and column).
• The determinant is the dot product of any row of A with its row of cofactors.
When a row of A has lot of zeros, we only need a few cofactors.
The idea behind cofactors is to reduce the order one step at a time.

## 5.3 Cramer’s Rule, Inverses, and Volumes

• Cramer’s Rule solves $$Ax = b$$ by ratios like $$x_1 = |B_1|/|A| = |ba_2 \dots a_n|/|A|$$.
• When C is the cofactor matrix for A, the inverse is $$A^{-1} = C^T / \text{det A}$$.
• The volume of a box is |det A|, when the box edges going out from the origin are the rows of A.
• The cross product is a vector with length $$\|u\|\|v\||sin\theta|$$. Its direction is perpendicular to u and v. Its points “up” or “down” by the right hand rule.

# 6 Eigenvalues and Eigenvectors

## 6.1 Introduction to Eigenvalues

• An eigenvector x lies along the same line as Ax: $$Ax = \lambda x$$. The eigenvalue is $$\lambda$$.
If $$Ax = \lambda x$$ then $$A^2 x = \lambda^2 x$$ and $$A^{-1}x = \lambda^{-1} x$$: the same x.
• If $$Ax = \lambda x$$ then $$(A-\lambda I)x = 0$$ and $$A-\lambda I$$ is singular and $$det(A-\lambda I) = 0$$: n eigenvalues (may be repeated or complex).
• The product of n eigenvalues equals determinant.
The sum of n eigenvalues equals trace.
• Projections have $$\lambda$$ = 1 and 0. Reflections have 1 and -1. Rotations have $$e^{i\theta}$$ and $$e^{-i\theta}$$.
• A and B shared the same n independent eigenvalues if and only if AB = BA.

## 6.2 Diagonalizing a Matrix

• n independent eigenvectors in X diagonalize A to the eigenvalue matrix $$\Lambda$$: $$A = X\Lambda X^{-1}$$ and $$\Lambda = X^{-1}AX$$.
The columns of $$AX = X\Lambda$$ are $$Ax_k = \lambda_k x_k$$.
Moreover, if those n eigenvectors are orthogonal then $$A = \Sigma \lambda_k x_k x_k^T$$.
• A is diagonalizable if every eigenvalue is different (which means that every eigenvector is independent).
A is diagonalizable if every eigenvalue has enough eigenvectors.
• Every matrix $$C = B^{-1}AB$$ has the same eigenvalues as A.
These C’s are similar to A.
• Solve $$u_{k+1} = Au_k$$ by $$u_k = A^k u_0 = X\Lambda^k X^{-1}u_0 = \Sigma c_i \lambda_i^k x_i$$ (transforming to an eigenvector basis).

## 6.4 Symmetric Matrices

• A real symmetric matrix S has n real eigenvalues and n orthonormal eigenvectors.
• For symmetric matrices the pivots and the eigenvalues have the same signs.
• N has n orthonormal eigenvectors if and only if N is normal ($$\overline{N}^T N = N \overline{N}^T$$).

## 6.5 Positive Definite Matrices

• Symmetric S: all eigenvalues > 0 $$\Leftrightarrow$$ all pivots > 0 $$\Leftrightarrow$$ all upper left determinants > 0 $$\Leftrightarrow$$ $$x^T Sx$$ if positive except at x = 0 $$\Leftrightarrow$$ $$S = A^T A$$ for A with independent columns.
The matrix S is then positive definite.
• S is positive simidefinite if 0 is allowed in the above inequalities (or dependent columns is allowed in A ($$S = A^T A$$)).

# 7 The Singular Value Decomposition (SVD)

## 7.1 Image Processing by Linear Algebra

• An image is a large matrix of grayscale values, one for each pixel and color.
When nearby pixels are correlated (not random) the image can be compressed.
• The SVD separates any matrix A into rank one pieces $$\lambda uv^T$$.
The SVD chooses rank one pieces in order of importance.
• Images mostly have full rank, but they do have low effective rank which means that many singular values are small and can be set to zero: we transmit a low rank approximation.

## 7.2 Bases and Matrices in the SVD

• What we want from the SVD: the right basis for the four subspaces.
$$A \left[\begin{matrix} v_1 & \cdots & v_r & \cdots & v_n \end{matrix}\right] = \left[\begin{matrix} u_1 & \cdots & u_r & \cdots & u_n \end{matrix}\right] \left[\begin{matrix} \sigma_1 & & && \\ & \ddots & && \\ & & \sigma_r && \\ & & && \end{matrix}\right]$$
SVD of any matrix: $$A = U \Sigma V^T = \Sigma \sigma_i u_i v_i^T$$
$$A^T A = (U\Sigma V^T)^T (U\Sigma V^T) = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T$$
• A has two sets of singular vectors, u’s and v’s:
$$u_1 \dots u_r$$ and $$v_1 \dots v_r$$ are orthonormal basis for the column space and the row space.
$$u_{r+1} \dots u_m$$ and $$v_{r+1} \dots v_n$$ are orthonormal basis for the left nullspace and the nullspace.
u’s are eigenvectors of $$AA^T$$, and v’s are eigenvectors of $$A^T A$$.
• A has one set of singular values, $$\sigma$$‘s:
Each $$\sigma^2$$ is $$\lambda(A^T A)$$
$$\sigma_i \dots \sigma_r$$ will be positive numbers
Because columns of A may be dependent, $$A^T A$$ is positive semidefinite, the extra $$\sigma$$‘s are zero
• There are serious instability of eigenvalues when $$AA^T$$ is far from $$A^T A$$.
The singular values of any matrix are stable.
• In $$S = \Sigma \lambda_i q_iq_i^T$$, $$\lambda_1$$ is the max ratio $$\frac{x^TSx}{x^Tx}$$ and the winning vector x is $$q_1$$.
In $$A = \Sigma \lambda_i u_iv_i^T$$, $$\sigma_1$$ is the max ratio $$\frac{\Vert Ax\Vert}{\Vert x\Vert}$$ and the winning vector x is $$v_1$$.

## 7.4 The Geometry of the SVD

• A typical square matrix $$A = U\Sigma V^T$$ factors into (rotation)(stretching)(rotation). A transform vectors x on a circle to vectors Ax on an ellipse.
This picture not applies to every 2by2 matrix because U and V didn’t allow for a reflection (A must be invertible).
• The norm of A is $$\Vert A \Vert = \sigma_1$$.
• Polar decomposition factors A into QS: rotation $$Q = UV^T$$ (orthogonal) times stretching $$S = V\Sigma V^T$$ (positive semidefinite).
• $$Q = UV^T$$ is the nearest orthogonal matrix to A. (This Q makes norm $$\Vert Q-A \Vert$$ as small as possible.)
The nearest singular matrix A0 to A comes by changing the smallest $$\sigma_{min}$$ to zero. • The pseudoinverse $$A^+ = V\Sigma^+ U^T$$ brings Ax in the column space bask to x in the row space (send $$N(A^T)$$ to Z).
$$A^+ u_i = \frac{1}{\sigma_i}v_i$$ for $$i \le r$$ and $$A^+ u_i = 0$$ for i > r.
$$AA^+$$ is the projection onto the column space of A.
$$A^+A$$ is the projection onto the row space of A.
• In least squares, when $$A^TA$$ is singular, the equation $$A^TAx = A^Tb$$ has infinitely many solutions.
The pseudoinverse gives the shortest solution: $$x^+ = A^+b$$.

# 8 Linear Transformations

## 8.1 The Idea of a Linear Transformation

• A linear transformation T takes vectors v to vectors T(v).
Linearity requires T(cv+dw) = cT(v)+dT(w).
• The input vectors v and the outputs T(v) can be in $$R^n$$ or matrix space or function space…
If A is m by n, T(x) = Ax is linear from the input space $$R^n$$ to the output space $$R^m$$.
The derivative $$T(f) = \frac{df}{dx}$$ is linear.
The integral $$T^+(f) = \int_0^x f(t)dt$$ is it pseudoinverse.
• The range of T is the set of all outputs T(v) (corresponds to column space).
The kernel of T is the set of all inputs for which T(v) = 0 (corresponds to nullspace).
• The product ST of two linear transformation is still linear: (ST)(v) = S(T(v)).

## 8.2 The Matrix of a Linear Transformation

• We know all T(v) if we know all $$T(v_1) \dots T(v_n)$$ for all input basis $$v_1 \dots v_n$$ (linearity).
• When the input basis is in the columns of V, and the output basis is in the columns of W, the change of basis matrix ($$T(v)=v$$) is $$B=W^{-1}V$$
$$\left[\begin{matrix} v_1 & \cdots & v_n \end{matrix}\right] \left[\begin{matrix} c_1 \\ \vdots \\ c_n \end{matrix}\right] = \left[\begin{matrix} w_1 & \cdots & w_n \end{matrix}\right] \left[\begin{matrix} d_1 \\ \vdots \\ d_n \end{matrix}\right]$$
$$u = Vc = Wd\quad d = W^{-1}Vc$$
c and d are two presentation of u in two set of basis, the change of basis matrix change u from c to d.
• Column j of the “matrix of T” comes from applying T to the input basis vector $$v_j$$.
$$T(v_j) = \Sigma a_{ij} w_i$$ is the combination of output basis.
• If A and B represent T and S, and output basis for S is the input basis for T, then the matrix AB represents the transformation T(S(u)).
• The same T represented by different matrices when we choose different bases.
The best input-output bases choose are eigenvectors / singular vectors of A:
• eigenvectors:
If the input space equals the output space, and if there are n independent eigenvectors, choose those as the input and output basis, the matrix for T is $$B^{-1}AB = \Lambda$$ = eigenvalues (because $$T = ITI$$).
$$A_{\text{b’s to b’s}} = B_{\text{standard to b’s}}^{-1}A_{\text{standard}}B_{\text{b’s to standard}}$$ is similar to $$A_{\text{standard}}$$.
• singular vectors:
If the input space may be different from the output sapce, the SVD says that $$A = U\Sigma V^{-1}$$, use the v’s and u’s as the input and output basis, the matrix for T is $$B_{out}^{-1}AB_{in} = U^{-1}AV = \Sigma$$ = singular values (because $$T = ITI$$).
$$C = Q_1^{-1}AQ_2$$ is isometric to A if $$Q_1$$ and $$Q_2$$ are orthogonal.