在看量子计算之前,决定先复习一下线性代数,毕竟一年半没看了(逃
大概就是快速过一遍GSLA吧,顺便拿来训练阅读速度
(下面的前五章是对着第四版(前五章跟第五版差别不大)写的,后七章是对着第五版写的)
Contents
- 1 Introduction to Vectors
- 2 Solving Linear Equations
- 3 Vector Spaces and Subspaces
- 4 Orthogonality
- 5 Determinants
- 6 Eigenvalues and Eigenvectors
- 7 The Singular Value Decomposition (SVD)
- 8 Linear Transformations
- 9 Complex Vectors and Matrices
- 10 Applications
- 11 Numerical Linear Algebra
- 12 Linear Algebra in Probability & Statistics
1 Introduction to Vectors
1.1 Vectors and Linear combinations
- the basic operations on vectors are multiplication cv and vector addition v + w.
- The sum of cv and dw is a linear combination of v and w.
- there is a perfect match between the column vector and the arrow from the origin and the point where the arrow ends.
1.2 Lengths and Dot Products
- The dot product \(v \cdot w\) multiplies each component \(v_i\) by \(w_i\) and adds all \(v_iw_i\).
- The length \(\|v\|\) of a vector is the square root of \(v \cdot v\).
- Unit vectors u and U at angle \(\theta\) have \(u \cdot U = cos \theta\).
- Schwarz inequality: \(|v \cdot w| \le \|v\| \|w\|\).
- triangle inequality: \( \|v + w\| \le \|v\| + \|w\| \).
1.3 Matrices
- Ax = b:
The matrix A acts on the vector x. The output Ax is a combination b of the columns of A. (in column’s view)
Ax is also dot products with rows. (in row’s view) - The relationship between Difference Matrix and Derivative.
The Sum Matrix is the inverse matrix of the Difference Matrix.
the Integration is the inverse of the Differentiation. - Independent columns: Ax = 0 has one solution. A is an invertible matrix.
Dependent columns: Cx = 0 has many solutions. C is a singular matrix.
2 Solving Linear Equations
2.1 Vectors and Linear Equations
- Column picture: Ax = b asks for a combination of columns to produce b.
- Row picture: Each equation in Ax = 0 gives a line (n = 2) or a plane (n = 3) or a “hyperplane” (n > 3).
They intersect at the solution or solutions, if any. - but Ax should be understood as a combination of the columns of A.
2.2 The idea of Elimination
- A linear system (Ax = b) becomes upper triangular (Ux = c) after elimination.
- The upper triangular system is solved by back substitution (starting at the bottom).
- A zero in the pivot position can be repaired if there is a nonzero below it.
- When breakdown is permanent, the system has no solution or infinitely many.
2.3 Elimination Using Matrices
- The Identity Matrix has 1’s on the diagonal and otherwise 0’s.
Then \(Ib = b\) for all b. - The Elementary Matrix or Elimination Matrix \(E_{ij}\) that substracts a multiple \(l\) of row j from row i has the extra nonzero entry \(-l\) in the i,j position (sitll diagonal 1’s)
- The Row Exchange Matrix \(P_{ij}\) is the identity matrix with rows i and j reversed.
When this Permutation Matrix \(P_{ij}\) multiplies a matrix, it exchanges rows i and j. - Elimination does the same row operations to A and to b. We can include b as an extra column and follow it through elimination. The matrix A is enlarged or augmented by the extra column b.
2.4 Rules for Matrix Operations
- The (i, j) entry of AB is (row i of A) \(\cdot\) (column j of B).
AB is also the sum of these matrices: (column j of A) times (row j of B). - A row times a column is an inner product.
A column times a row is an outer product.
These are extreme cases of matrix multiplication. - Each column of AB is a combination of the columns of A. (matrix B acts from right)
Each row of AB is a combination of the rows of B. (matrix A acts from left) - Block multiplication is allowed when the block shapes match correctly.
Block elimination produces the Schur complement \(D – CA^{-1}B\). - The associative law for ABC is satisfied.
The commutative law is usually broken.
2.5 Inverse Matrices
- The inverse matrix gives \(AA^{-1} = I\) and \(A^{-1}A = I\)
- The algorithm to test invertibility is elimination : A is invertible if and only if it has n pivots (row exchanges allowed).
The algebra test for invertibility is the determinant of A: det A must not be zero.
The equation that tests for invertibility is Ax = 0: x = 0 must be the only solution. - The inverse of \(AB\) is the reverse product \(B^{-1}A^{-1}\).
- The Gauss-Jordan method solves \(AA^{-1} = I\) to find the n columns of \(A^{-1}\).
The augmented matrix \([A I]\) is row-reduced to \([I A^{-1}]\). - If A is symmetric or triangular, so is \(A^{-1}\).
- Diagonally dominant matrices are invertible. Each \(|a_{ii}|\) dominates its row.
2.6 Elimination = Factorization: A = LU
- Gaussian elimination (with no row exchanges) factors A into L times U.
- The lower triangular L contains the multiplier \(l_{ij}\) that multipy pivot rows, while going from A to U.
The product LU add those rows back to recover A. - The original system Ax = b is factored into two triangular systems:
solve Lc = b (forward) and Ux = c (backward)
2.7 Transposes and Permutations
- The transpose puts the rows of A into the columns of \(A^T\). Then \((A^T)_{ij} = A_{ji}\).
- The transpose of \(AB\) is \(B^TA^T\). The transpose of \(A^{-1}\) is the inverse of \(A^T\).
- The dot product is \(x \cdot y = x^Ty\). Then \((Ax)^Ty\) equals the dot product \(x^T(A^Ty)\).
- When A is symmetric \((A^T = A)\), its LDU factorization is symmetric: \(A = LDL^T\).
- A Permutation Matrix P has the rows of the identity \(I\) in any order, and \(P^T = P^{-1}\).
- If A is invertible then a Permutation P will reorder its rows for \(PA = LU\).
3 Vector Spaces and Subspaces
3.1 Spaces of Vectors
- \(R^n\) contains all column vectors with n real components.
- A vector space must closed under linear combination.
A subspace containing v and w must contain all their combinations \(cv + dw\). - The combinations of the columns of A form the column space \(C(A)\). Then the column space is spanned by the columns.
- \(Ax = b\) has a solution exactly when b is in the column space of A.
3.2 The Nullspace of A: solving \(Ax = 0\)
- The nullspace \(N(A)\) is a subspace of \(R^n\). It contains all solutions to \(Ax = 0\).
- Elimination produces an echelon matrix U, and then a row reduced R, with pivot columns and free columns.
Every free column of U or R leads to a special solution.
The complete solution to \(Ax = 0\) is a combination of the special solutions. - Suppose \(Ax = 0\) has more unknowns than equations (n > m, more columns than rows).
Then there are nonzero solutions. There must be free columns.
3.3 The Rank and the Row Reduced Form
- The rank r of A is the number of pivots.
The true size of A is given by its rank. - The r pivot columns are not combinations of earlier columns.
The n – r free columns are combinations of earlier columns (pivot columns). - Those combinations (using -F taken from R) give the n – r special solutions to \(Ax = 0\) and \(Rx = 0\).
They are the n – r columns of the nullspace matrix N.
3.4 The Complete Solution to \(Ax = b\)
- \(Ax = b\) is solvable if and only if the last m – r equations reduce to 0 = 0.
- One particular solution \(x_p\) has all free variables equal to zero.
The pivot variables are determined after the free variables are chosen. - \(x = x_p + x_n\)
The particular solution \(x_p\) will be one point on the “line”.
Adding the nullspace vector \(x_n\) will move us along the “line”. - Full column rank r = n means no free variables: one solution is m = n or none if m > n (overdetermined).
- Full row rank r = m means one solution if m = n or infinitely many if m < n (underdetermined).
3.5 Independence, Basis and Dimension
- The columns of A are independent if x = 0 if the only solution to \(Ax = 0\).
- The vectors \(v_1, \dots ,v_r\) span a space if their combinations fill that space.
- A basis consists of linearly independent vectors that span the space.
Every vector in the space is a unique combination of the basis vectors. - All bases for a space have the same number of vectors.
This number of vectors in a basis is the dimension of the space. - The pivot columns are one basis for the column space. The dimension is r.
- The vectors \(v_1, \dots ,v_n\) are a basis for \(R^n\) exactly when they are the columns of an n by n invertible matrix.
Thus \(R^n\) has infinitely many different bases. - Space Z contains only the zero vector.
The dimension of this space is zero.
The empty set is a basis for Z (We never allow the zero vector into a basis).
3.6 Dimension of the Four Subspaces
- Fundamental Theorem of Linear Algebra, Part 1:
The column space \(C(A)\) and the row space \(C(A^T)\) both have dimension r (the rank of A).
The nullspace \(N(A)\) has dimension n − r. The left nullspace \(N(A^T)\) has dimension m − r. - Elimination produces bases for the row space and nullspace of A: They are the same as for R.
- Elimination often changes the column space and left nullspace (but dimensions don’t change).
- Rank one matrices: \(A = uv^T = \text{column times row}\): \(C(A)\) has basis u, \(C(A^T)\) has basis v.
4 Orthogonality
4.1 Orthogonality of the Four Subspaces
- Fundamental Theorem of Linear Algebra, Part 2:
\(N(A)\) is the orthogonal complement of the row space \(C(A^T)\) (in \(R^n\)).
\(N(A^T)\) is the orthogonal complement of the column space \(C(A)\) (in \(R^m\)). - Subspace V and W are orthogonal if every v in V is orthogonal to every w in W.
- V and W are orthogonal complements if W contains all vectors perpendicular to V (and vice versa).
Inside \(R^n\), the dimensions of complements V and W add to n.
Orthogonality is impossible when dim V + dim W > dimension of whole space. - When a vector is in two orthogonal subspaces, it must be zero which is perpendicular to itself.
- Any n independent vectors in \(R^n\) will span \(R^n\).
- Every x in \(R^n\) has a nullspace component \(x_n\) and a row space component \(x_r\).
- Every vector b in column space comes from one and only one vector in the row space.
From the row space to the column space, A is invertible (pseudoinverse).
4.2 Projections
- The rank one projection matrix \(P = aa^T / a^Ta\) multiplies b to produce the projection p onto the line through a.
- Projecting b onto a subspace leaves \(e = b – p\) perpendicular to subspace.
When \(P\) projects onto subspace, \(I – P\) projects onto the perpendicular subspace. - To project any b onto the column space of any matrix A
The projection p is the combination \(A\hat{x}\) which is closest to b
The error vector \(b – A\hat{x}\) is perpendicular to the column space of A
Therefore \(b – A\hat{x}\) is in the nullspace of \(A^T\). This means \(A^T(b – A\hat{x}) = 0\).
When A has full rank n, the equation \(A^TA\hat{x} = A^Tb\) leads to \(\hat{x}\) and \(p = A\hat{x}\). - The projection matrix \(P = A(A^TA)^{-1}A^T\) has \(P^T = P\) and \(P^2 = P\).
The matrix A is rectangular, which means that A may be singular.
When A is singular, we cannot split \((A^TA)^{-1}\) into \(A^{-1}\) times \((A^T)^{-1}\) because there is no \(A^{-1}\) in the first place.
When A has independent columns, \(A^TA\) is square, symmetric, and invertible.
4.3 Least Squares Approximations
- When \(Ax = b = p + e\) is impossible, \(A\hat{x} = p\) is solvable.
b is outside the column space, and the nearest point in the column space is its projection p when the error e is orthogonal to the column space. - The best \(\hat{x}\) comes from the normal equations \(A^TA\hat{x} = A^Tb\).
The best \(\hat{x}\) minimizes \(E = \|Ax – b\|^2\) which is the sum of squares of the errors in the m equations (m > n). - If we try to fit m points by a combination of n < m functions, the m equations \(Ax = b\) are generally unsolvable.
The n equations \(A^TA\hat{x} = A^Tb\) give the least squares solution—— the combination with smallest MSE (mean square error).
4.4 Orthogonal Bases and Gram-Schmidt
- A matrix Q with orthonormal columns satisfies \(Q^TQ = I\)
When Q is square (an orthogonal matrix), \(Q^TQ = I\) means that \(Q^T = Q^{-1}\): transpose = inverse. - Q preserves length: \(\|Qx\| = \|x\|\)
Q preserves dot products: \((Qx)^T(Qy) = x^Ty\) - The projection onto the column space is \(P = QQ^T\)
if Q is square then \(P = I\) and every \(b = q_1(q_1^Tb) + \dots + q_n(q_n^Tb)\). - Gram-Schmidt process: subtract from every new vector its projections in the directions already set
In matrix form this is the factorization \(A = QR\).
R is upper triangular because of the non-involvement of later vectors (the key point of Gram-Schmidt) - The least squares solution of \(Qx = b\) is \(\hat{x} = Q^Tb\)
The least squares solution of \(Ax = b\) is \(\hat{x} = R^{-1}Q^Tb\)
5 Determinants
5.1 The Properties of Determinants
- The determinant is defined by det I = 1, sign reversal, and linearity in each row.
- After elimination det A is \(\pm\) (product of the pivots).
- The determinant is zero exactly when A is not invertible.
- Two remarkable properties are det AB = (det A)(det B) and det \(a^T\) = det A.
5.2 Permutations and Cofactors
- The first k pivots come from the k by k matrix \(A_k\) in the top left conrner of A.
The determinant of that corner submatrix \(A_k\) is \(d_1d_2 \dots d_k\). - det A = sum over all n! column permutations \(P = (\alpha, \beta, \dots, \omega)\) = \(\Sigma(\text{det P})a_{1\alpha}a_{2\beta}\dots a_{n\omega}\) = BIG FORMULA
- The cofactor \(C_{ij}\) is \((-1)^{i+j}\) times the smaller determinant that omits row i and column j (because \(a_{ij}\) uses that row and column).
- The determinant is the dot product of any row of A with its row of cofactors.
When a row of A has lot of zeros, we only need a few cofactors.
The idea behind cofactors is to reduce the order one step at a time.
5.3 Cramer’s Rule, Inverses, and Volumes
- Cramer’s Rule solves \(Ax = b\) by ratios like \(x_1 = |B_1|/|A| = |ba_2 \dots a_n|/|A|\).
- When C is the cofactor matrix for A, the inverse is \(A^{-1} = C^T / \text{det A}\).
- The volume of a box is |det A|, when the box edges going out from the origin are the rows of A.
- The cross product is a vector with length \(\|u\|\|v\||sin\theta|\). Its direction is perpendicular to u and v. Its points “up” or “down” by the right hand rule.
6 Eigenvalues and Eigenvectors
6.1 Introduction to Eigenvalues
- An eigenvector x lies along the same line as Ax: \(Ax = \lambda x\). The eigenvalue is \(\lambda\).
If \(Ax = \lambda x\) then \(A^2 x = \lambda^2 x\) and \(A^{-1}x = \lambda^{-1} x\): the same x. - If \(Ax = \lambda x\) then \((A-\lambda I)x = 0\) and \(A-\lambda I\) is singular and \(det(A-\lambda I) = 0\): n eigenvalues (may be repeated or complex).
- The product of n eigenvalues equals determinant.
The sum of n eigenvalues equals trace. - Projections have \(\lambda\) = 1 and 0. Reflections have 1 and -1. Rotations have \(e^{i\theta}\) and \(e^{-i\theta}\).
- A and B shared the same n independent eigenvalues if and only if AB = BA.
6.2 Diagonalizing a Matrix
- n independent eigenvectors in X diagonalize A to the eigenvalue matrix \(\Lambda\): \(A = X\Lambda X^{-1}\) and \(\Lambda = X^{-1}AX\).
The columns of \(AX = X\Lambda\) are \(Ax_k = \lambda_k x_k\).
Moreover, if those n eigenvectors are orthogonal then \(A = \Sigma \lambda_k x_k x_k^T\). - A is diagonalizable if every eigenvalue is different (which means that every eigenvector is independent).
A is diagonalizable if every eigenvalue has enough eigenvectors. - Every matrix \(C = B^{-1}AB\) has the same eigenvalues as A.
These C’s are similar to A. - Solve \(u_{k+1} = Au_k\) by \(u_k = A^k u_0 = X\Lambda^k X^{-1}u_0 = \Sigma c_i \lambda_i^k x_i\) (transforming to an eigenvector basis).
6.3 Systems of Differential Equations
6.4 Symmetric Matrices
- A real symmetric matrix S has n real eigenvalues and n orthonormal eigenvectors.
- For symmetric matrices the pivots and the eigenvalues have the same signs.
- N has n orthonormal eigenvectors if and only if N is normal (\(\overline{N}^T N = N \overline{N}^T\)).
6.5 Positive Definite Matrices
- Symmetric S: all eigenvalues > 0 \(\Leftrightarrow\) all pivots > 0 \(\Leftrightarrow\) all upper left determinants > 0 \(\Leftrightarrow\) \(x^T Sx\) if positive except at x = 0 \(\Leftrightarrow\) \(S = A^T A\) for A with independent columns.
The matrix S is then positive definite. - S is positive simidefinite if 0 is allowed in the above inequalities (or dependent columns is allowed in A (\(S = A^T A\))).
7 The Singular Value Decomposition (SVD)
7.1 Image Processing by Linear Algebra
- An image is a large matrix of grayscale values, one for each pixel and color.
When nearby pixels are correlated (not random) the image can be compressed. - The SVD separates any matrix A into rank one pieces \(\lambda uv^T\).
The SVD chooses rank one pieces in order of importance. - Images mostly have full rank, but they do have low effective rank which means that many singular values are small and can be set to zero: we transmit a low rank approximation.
7.2 Bases and Matrices in the SVD
- What we want from the SVD: the right basis for the four subspaces.
\(
A
\left[\begin{matrix}
v_1 & \cdots & v_r & \cdots & v_n
\end{matrix}\right]
=
\left[\begin{matrix}
u_1 & \cdots & u_r & \cdots & u_n
\end{matrix}\right]
\left[\begin{matrix}
\sigma_1 & & && \\
& \ddots & && \\
& & \sigma_r && \\
& & &&
\end{matrix}\right]
\)
SVD of any matrix: \(A = U \Sigma V^T = \Sigma \sigma_i u_i v_i^T\)
\(A^T A = (U\Sigma V^T)^T (U\Sigma V^T) = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T\) - A has two sets of singular vectors, u’s and v’s:
\(u_1 \dots u_r\) and \(v_1 \dots v_r\) are orthonormal basis for the column space and the row space.
\(u_{r+1} \dots u_m\) and \(v_{r+1} \dots v_n\) are orthonormal basis for the left nullspace and the nullspace.
u’s are eigenvectors of \(AA^T\), and v’s are eigenvectors of \(A^T A\). - A has one set of singular values, \(\sigma\)‘s:
Each \(\sigma^2\) is \(\lambda(A^T A)\)
\(\sigma_i \dots \sigma_r\) will be positive numbers
Because columns of A may be dependent, \(A^T A\) is positive semidefinite, the extra \(\sigma\)‘s are zero - There are serious instability of eigenvalues when \(AA^T\) is far from \(A^T A\).
The singular values of any matrix are stable. - In \(S = \Sigma \lambda_i q_iq_i^T\), \(\lambda_1\) is the max ratio \(\frac{x^TSx}{x^Tx}\) and the winning vector x is \(q_1\).
In \(A = \Sigma \lambda_i u_iv_i^T\), \(\sigma_1\) is the max ratio \(\frac{\Vert Ax\Vert}{\Vert x\Vert}\) and the winning vector x is \(v_1\).
7.3 Principal Component Analysis
7.4 The Geometry of the SVD
- A typical square matrix \(A = U\Sigma V^T\) factors into (rotation)(stretching)(rotation).
A transform vectors x on a circle to vectors Ax on an ellipse.
This picture not applies to every 2by2 matrix because U and V didn’t allow for a reflection (A must be invertible). - The norm of A is \(\Vert A \Vert = \sigma_1\).
- Polar decomposition factors A into QS: rotation \(Q = UV^T\) (orthogonal) times stretching \(S = V\Sigma V^T\) (positive semidefinite).
- \(Q = UV^T\) is the nearest orthogonal matrix to A. (This Q makes norm \(\Vert Q-A \Vert\) as small as possible.)
The nearest singular matrix A0 to A comes by changing the smallest \(\sigma_{min}\) to zero.
- The pseudoinverse \(A^+ = V\Sigma^+ U^T\) brings Ax in the column space bask to x in the row space (send \(N(A^T)\) to Z).
\(A^+ u_i = \frac{1}{\sigma_i}v_i\) for \(i \le r\) and \(A^+ u_i = 0\) for i > r.
\(AA^+\) is the projection onto the column space of A.
\(A^+A\) is the projection onto the row space of A. - In least squares, when \(A^TA\) is singular, the equation \(A^TAx = A^Tb\) has infinitely many solutions.
The pseudoinverse gives the shortest solution: \(x^+ = A^+b\).
8 Linear Transformations
8.1 The Idea of a Linear Transformation
- A linear transformation T takes vectors v to vectors T(v).
Linearity requires T(cv+dw) = cT(v)+dT(w). - The input vectors v and the outputs T(v) can be in \(R^n\) or matrix space or function space…
If A is m by n, T(x) = Ax is linear from the input space \(R^n\) to the output space \(R^m\).
The derivative \(T(f) = \frac{df}{dx}\) is linear.
The integral \(T^+(f) = \int_0^x f(t)dt\) is it pseudoinverse. - The range of T is the set of all outputs T(v) (corresponds to column space).
The kernel of T is the set of all inputs for which T(v) = 0 (corresponds to nullspace). - The product ST of two linear transformation is still linear: (ST)(v) = S(T(v)).
8.2 The Matrix of a Linear Transformation
- We know all T(v) if we know all \(T(v_1) \dots T(v_n)\) for all input basis \(v_1 \dots v_n\) (linearity).
- When the input basis is in the columns of V, and the output basis is in the columns of W, the change of basis matrix (\(T(v)=v\)) is \(B=W^{-1}V\)
\(
\left[\begin{matrix}
v_1 & \cdots & v_n
\end{matrix}\right]
\left[\begin{matrix}
c_1 \\
\vdots \\
c_n
\end{matrix}\right]
=
\left[\begin{matrix}
w_1 & \cdots & w_n
\end{matrix}\right]
\left[\begin{matrix}
d_1 \\
\vdots \\
d_n
\end{matrix}\right]
\)
\(u = Vc = Wd\quad d = W^{-1}Vc\)
c and d are two presentation of u in two set of basis, the change of basis matrix change u from c to d. - Column j of the “matrix of T” comes from applying T to the input basis vector \(v_j\).
\(T(v_j) = \Sigma a_{ij} w_i\) is the combination of output basis. - If A and B represent T and S, and output basis for S is the input basis for T, then the matrix AB represents the transformation T(S(u)).
- The same T represented by different matrices when we choose different bases.
The best input-output bases choose are eigenvectors / singular vectors of A: - eigenvectors:
If the input space equals the output space, and if there are n independent eigenvectors, choose those as the input and output basis, the matrix for T is \(B^{-1}AB = \Lambda\) = eigenvalues (because \(T = ITI\)).
\(A_{\text{b’s to b’s}} = B_{\text{standard to b’s}}^{-1}A_{\text{standard}}B_{\text{b’s to standard}}\) is similar to \(A_{\text{standard}}\). - singular vectors:
If the input space may be different from the output sapce, the SVD says that \(A = U\Sigma V^{-1}\), use the v’s and u’s as the input and output basis, the matrix for T is \(B_{out}^{-1}AB_{in} = U^{-1}AV = \Sigma\) = singular values (because \(T = ITI\)).
\(C = Q_1^{-1}AQ_2\) is isometric to A if \(Q_1\) and \(Q_2\) are orthogonal.