- 01 Vectors and Vector Spaces: The Language of Data 02 Matrices and Matrix Operations: Organizing Linear Computation 03 Systems of Linear Equations: From Geometry to Algorithms 04 Determinants: The Volume Factor of Linear Maps 05 Linear Transformations: Matrices as Functions 06 Inner Products, Norms, and Orthogonality: Measuring Geometry 07 Eigenvalues and Eigenvectors: The DNA of a Matrix 08 Matrix Decompositions: Breaking Matrices into Simpler Pieces 09 Linear Algebra in Machine Learning: Putting It All Together 10 Matrix Calculus: Derivatives for Machine Learning 11 Tensor Operations: Beyond Matrices 12 Sparse Matrices and Efficient Computation 13 Randomized Linear Algebra: Speed Through Randomness
Introduction
Most vectors change direction when multiplied by a matrix. But some special vectors only get scaled — they point in the same direction (or flip) but never rotate. These are eigenvectors, and the scaling factors are eigenvalues. Together, they reveal the intrinsic structure of a linear transformation.
Eigenvalues and eigenvectors are not just an abstract curiosity. They are the mathematical engine behind PCA, Google’s PageRank, spectral clustering, stability analysis in dynamical systems, and the spectral theory of graphs. This article builds on inner products and orthogonality and leads directly into matrix decompositions.
Definition
Let be an matrix. A nonzero vector is an eigenvector of if:
for some scalar . The scalar is the corresponding eigenvalue.
In words: applying to produces the same vector scaled by . The eigenvector defines a direction that the transformation preserves.
| Effect on eigenvector | |
|---|---|
| Stretched along the eigenvector direction | |
| Compressed | |
| Unchanged | |
| Collapsed to zero (eigenvector is in the null space) | |
| Flipped and scaled |
Geometric interpretation: Imagine the matrix as a transformation that warps space. The eigenvectors are the directions along which the warping is pure scaling — no rotation, no shearing. The eigenvalues tell you how much scaling happens along each direction.
Finding Eigenvalues: The Characteristic Equation
Rearranging :
For a nonzero solution to exist, the matrix must be singular:
This is the characteristic equation. The left side is a polynomial of degree in , called the characteristic polynomial.
For a matrix :
The eigenvalues satisfy:
These relationships generalize: for any matrix, the sum of eigenvalues equals the trace and the product equals the determinant.
Finding Eigenvectors
Once you have an eigenvalue , find the corresponding eigenvector(s) by solving:
The set of all solutions (including ) is the eigenspace for — it is the null space of .
Worked Example
Find the eigenvalues and eigenvectors of:
Step 1: Characteristic equation
Eigenvalues: , .
Check: and .
Step 2: Eigenvectors
For :
This gives , so (or any scalar multiple).
For :
This gives , so .
import numpy as np
A = np.array([[4, 1],
[2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}") # [5. 2.]
print(f"Eigenvectors:\n{eigenvectors}")
# Each column is an eigenvector
Diagonalization
If an matrix has linearly independent eigenvectors, it can be diagonalized:
where is the matrix whose columns are eigenvectors and is the diagonal matrix of eigenvalues:
Why Diagonalization Matters
Diagonalization transforms a complex transformation into independent scaling along each eigenvector axis. It makes many computations trivial:
Matrix powers:
Matrix exponential (used in differential equations and continuous-time models):
Key insight: Diagonalization is a change of basis to the “natural” coordinate system of the transformation — the one where it acts as pure scaling.
The Spectral Theorem
The spectral theorem states that every real symmetric matrix can be diagonalized by an orthogonal matrix:
where:
- is orthogonal (), with eigenvectors as columns
- is diagonal with real eigenvalues
- All eigenvalues are real (even though eigenvalues of general matrices can be complex)
- Eigenvectors corresponding to distinct eigenvalues are orthogonal
This is the most important theorem in applied linear algebra. It guarantees that symmetric matrices — which include covariance matrices, Gram matrices, and Hessians — always have a clean, orthogonal decomposition.
Key insight: PCA is a direct application of the spectral theorem. The covariance matrix is symmetric, so it decomposes as . The eigenvectors (columns of ) are the principal components, and the eigenvalues (diagonal of ) are the variances along those components.
Eigenvalues of Special Matrices
| Matrix Type | Eigenvalue Property |
|---|---|
| Symmetric () | All eigenvalues are real |
| Positive definite | All eigenvalues are positive |
| Positive semi-definite | All eigenvalues are |
| Orthogonal () | All eigenvalues have $ |
| Idempotent () | Eigenvalues are 0 or 1 |
| Nilpotent () | All eigenvalues are 0 |
| Triangular | Eigenvalues are the diagonal entries |
Algebraic vs. Geometric Multiplicity
The algebraic multiplicity of an eigenvalue is its multiplicity as a root of the characteristic polynomial.
The geometric multiplicity is the dimension of the eigenspace .
The geometric multiplicity is always the algebraic multiplicity. When they are equal for all eigenvalues, the matrix is diagonalizable. When they differ, the matrix has a more complex structure (requiring Jordan normal form).
The Power Method
For large matrices, computing eigenvalues via the characteristic polynomial is impractical. The power method finds the largest eigenvalue iteratively:
- Start with a random vector
- Iterate:
- The vectors converge to the eigenvector of the largest eigenvalue
The rationale: if , then:
The term with the largest dominates as .
Historical note: Google’s PageRank algorithm is essentially the power method applied to the web’s link matrix. The dominant eigenvector gives the steady-state distribution of a random web surfer — pages with higher eigenvector components rank higher.
import numpy as np
A = np.array([[4, 1],
[2, 3]])
# Power method
x = np.random.randn(2)
for _ in range(20):
x = A @ x
x = x / np.linalg.norm(x)
eigenvalue = x @ A @ x # Rayleigh quotient
print(f"Dominant eigenvalue: {eigenvalue:.4f}") # ≈ 5.0
print(f"Eigenvector: {x}")
The Rayleigh Quotient
The Rayleigh quotient of a symmetric matrix for a nonzero vector is:
The Rayleigh quotient satisfies . It equals an eigenvalue exactly when is the corresponding eigenvector.
The minimum and maximum of over all nonzero give the smallest and largest eigenvalues:
Key insight: PCA can be stated as: find the direction that maximizes the Rayleigh quotient of the covariance matrix. That direction is the first principal component — the eigenvector with the largest eigenvalue.
Condition Number
The condition number of a matrix is the ratio of its largest to smallest singular value (or, for symmetric positive definite matrices, the ratio of largest to smallest eigenvalue):
| Condition number | Interpretation |
|---|---|
| Well-conditioned; small input changes cause small output changes | |
| Ill-conditioned; the system is sensitive to perturbations | |
| Singular matrix |
In ML, ill-conditioned matrices cause:
- Slow convergence of gradient descent (the loss landscape is elongated)
- Numerical instability in solving linear systems
- Poor generalization when the condition number of is large (multicollinearity)
Why This Matters for ML
- PCA: The principal components are eigenvectors of the covariance matrix. Eigenvalues give the variance explained by each component.
- PageRank: Page importance is the dominant eigenvector of the web link matrix.
- Spectral clustering: Cluster structure is revealed by the eigenvectors of the graph Laplacian.
- Stability of dynamical systems: A linear system is stable if all eigenvalues satisfy .
- Gradient descent convergence: The condition number of the Hessian determines convergence speed.
- Regularization: Adding to a matrix (Ridge regularization) shifts all eigenvalues by , improving conditioning.
Summary
- An eigenvector of satisfies — it is only scaled, never rotated.
- Eigenvalues are found from the characteristic equation .
- Diagonalization reveals the transformation’s action as independent scaling along eigenvector axes.
- The spectral theorem guarantees that symmetric matrices have real eigenvalues and orthogonal eigenvectors.
- The power method finds the dominant eigenvalue iteratively — this is how PageRank works.
- The condition number controls numerical stability and convergence speed.
- PCA, spectral clustering, and stability analysis are all eigenvalue problems in disguise.
- Next, we generalize eigendecomposition to rectangular matrices in matrix decompositions, including the powerful SVD.
References
- Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press. math.mit.edu/~gs/linearalgebra
- Axler, S. (2024). Linear Algebra Done Right (4th ed.). Springer. linear.axler.net
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, Chapter 2. MIT Press. deeplearningbook.org
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.