Eigenvalues and Eigenvectors: The DNA of a Matrix

Discover eigenvalues and eigenvectors — the special directions that reveal a matrix's intrinsic behavior, powering PCA, PageRank, and spectral methods.

Linear Algebra March 6, 2026 9 min read

Introduction

Most vectors change direction when multiplied by a matrix. But some special vectors only get scaled — they point in the same direction (or flip) but never rotate. These are eigenvectors, and the scaling factors are eigenvalues. Together, they reveal the intrinsic structure of a linear transformation.

Eigenvalues and eigenvectors are not just an abstract curiosity. They are the mathematical engine behind PCA, Google’s PageRank, spectral clustering, stability analysis in dynamical systems, and the spectral theory of graphs. This article builds on inner products and orthogonality and leads directly into matrix decompositions.

Definition

Let A\mathbf{A} be an n×nn \times n matrix. A nonzero vector v\mathbf{v} is an eigenvector of A\mathbf{A} if:

Av=λv\mathbf{A}\mathbf{v} = \lambda \mathbf{v}

for some scalar λ\lambda. The scalar λ\lambda is the corresponding eigenvalue.

In words: applying A\mathbf{A} to v\mathbf{v} produces the same vector scaled by λ\lambda. The eigenvector defines a direction that the transformation preserves.

λ\lambdaEffect on eigenvector
λ>1\lambda > 1Stretched along the eigenvector direction
0<λ<10 < \lambda < 1Compressed
λ=1\lambda = 1Unchanged
λ=0\lambda = 0Collapsed to zero (eigenvector is in the null space)
λ<0\lambda < 0Flipped and scaled

Geometric interpretation: Imagine the matrix A\mathbf{A} as a transformation that warps space. The eigenvectors are the directions along which the warping is pure scaling — no rotation, no shearing. The eigenvalues tell you how much scaling happens along each direction.

Finding Eigenvalues: The Characteristic Equation

Rearranging Av=λv\mathbf{A}\mathbf{v} = \lambda\mathbf{v}:

(AλI)v=0(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}

For a nonzero solution v\mathbf{v} to exist, the matrix (AλI)(\mathbf{A} - \lambda\mathbf{I}) must be singular:

det(AλI)=0\det(\mathbf{A} - \lambda\mathbf{I}) = 0

This is the characteristic equation. The left side is a polynomial of degree nn in λ\lambda, called the characteristic polynomial.

For a 2×22 \times 2 matrix A=[abcd]\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}:

det(AλI)=λ2(a+d)λ+(adbc)=λ2tr(A)λ+det(A)\det(\mathbf{A} - \lambda\mathbf{I}) = \lambda^2 - (a+d)\lambda + (ad - bc) = \lambda^2 - \text{tr}(\mathbf{A})\lambda + \det(\mathbf{A})

The eigenvalues satisfy:

λ1+λ2=tr(A),λ1λ2=det(A)\lambda_1 + \lambda_2 = \text{tr}(\mathbf{A}), \qquad \lambda_1 \cdot \lambda_2 = \det(\mathbf{A})

These relationships generalize: for any n×nn \times n matrix, the sum of eigenvalues equals the trace and the product equals the determinant.

Finding Eigenvectors

Once you have an eigenvalue λ\lambda, find the corresponding eigenvector(s) by solving:

(AλI)v=0(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}

The set of all solutions (including 0\mathbf{0}) is the eigenspace for λ\lambda — it is the null space of (AλI)(\mathbf{A} - \lambda\mathbf{I}).

Worked Example

Find the eigenvalues and eigenvectors of:

A=[4123]\mathbf{A} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}

Step 1: Characteristic equation

det(AλI)=det[4λ123λ]=(4λ)(3λ)2=λ27λ+10=(λ5)(λ2)=0\begin{aligned} \det(\mathbf{A} - \lambda\mathbf{I}) &= \det\begin{bmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{bmatrix} \\[6pt] &= (4-\lambda)(3-\lambda) - 2 \\[6pt] &= \lambda^2 - 7\lambda + 10 \\[6pt] &= (\lambda - 5)(\lambda - 2) = 0 \end{aligned}

Eigenvalues: λ1=5\lambda_1 = 5, λ2=2\lambda_2 = 2.

Check: λ1+λ2=7=tr(A)\lambda_1 + \lambda_2 = 7 = \text{tr}(\mathbf{A}) and λ1λ2=10=det(A)\lambda_1 \cdot \lambda_2 = 10 = \det(\mathbf{A}).

Step 2: Eigenvectors

For λ1=5\lambda_1 = 5:

(A5I)v=[1122]v=0(\mathbf{A} - 5\mathbf{I})\mathbf{v} = \begin{bmatrix} -1 & 1 \\ 2 & -2 \end{bmatrix}\mathbf{v} = \mathbf{0}

This gives v1+v2=0-v_1 + v_2 = 0, so v1=[11]\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} (or any scalar multiple).

For λ2=2\lambda_2 = 2:

(A2I)v=[2121]v=0(\mathbf{A} - 2\mathbf{I})\mathbf{v} = \begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix}\mathbf{v} = \mathbf{0}

This gives 2v1+v2=02v_1 + v_2 = 0, so v2=[12]\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}.

import numpy as np

A = np.array([[4, 1],
              [2, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")   # [5. 2.]
print(f"Eigenvectors:\n{eigenvectors}")
# Each column is an eigenvector

Diagonalization

If an n×nn \times n matrix A\mathbf{A} has nn linearly independent eigenvectors, it can be diagonalized:

A=PDP1\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}

where P\mathbf{P} is the matrix whose columns are eigenvectors and D\mathbf{D} is the diagonal matrix of eigenvalues:

P=[v1v2vn],D=[λ1λ2λn]\mathbf{P} = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix}, \qquad \mathbf{D} = \begin{bmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}

Why Diagonalization Matters

Diagonalization transforms a complex transformation into independent scaling along each eigenvector axis. It makes many computations trivial:

Matrix powers:

Ak=PDkP1=P[λ1kλ2k]P1\mathbf{A}^k = \mathbf{P}\mathbf{D}^k\mathbf{P}^{-1} = \mathbf{P}\begin{bmatrix} \lambda_1^k & & \\ & \lambda_2^k & \\ & & \ddots \end{bmatrix}\mathbf{P}^{-1}

Matrix exponential (used in differential equations and continuous-time models):

eAt=P[eλ1teλ2t]P1e^{\mathbf{A}t} = \mathbf{P}\begin{bmatrix} e^{\lambda_1 t} & & \\ & e^{\lambda_2 t} & \\ & & \ddots \end{bmatrix}\mathbf{P}^{-1}

Key insight: Diagonalization is a change of basis to the “natural” coordinate system of the transformation — the one where it acts as pure scaling.

The Spectral Theorem

The spectral theorem states that every real symmetric matrix A=AT\mathbf{A} = \mathbf{A}^T can be diagonalized by an orthogonal matrix:

A=QΛQT\mathbf{A} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T

where:

  • Q\mathbf{Q} is orthogonal (QTQ=I\mathbf{Q}^T\mathbf{Q} = \mathbf{I}), with eigenvectors as columns
  • Λ\boldsymbol{\Lambda} is diagonal with real eigenvalues
  • All eigenvalues are real (even though eigenvalues of general matrices can be complex)
  • Eigenvectors corresponding to distinct eigenvalues are orthogonal

This is the most important theorem in applied linear algebra. It guarantees that symmetric matrices — which include covariance matrices, Gram matrices, and Hessians — always have a clean, orthogonal decomposition.

Key insight: PCA is a direct application of the spectral theorem. The covariance matrix Σ\boldsymbol{\Sigma} is symmetric, so it decomposes as Σ=QΛQT\boldsymbol{\Sigma} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T. The eigenvectors (columns of Q\mathbf{Q}) are the principal components, and the eigenvalues (diagonal of Λ\boldsymbol{\Lambda}) are the variances along those components.

Eigenvalues of Special Matrices

Matrix TypeEigenvalue Property
Symmetric (A=AT\mathbf{A} = \mathbf{A}^T)All eigenvalues are real
Positive definiteAll eigenvalues are positive
Positive semi-definiteAll eigenvalues are 0\geq 0
Orthogonal (QTQ=I\mathbf{Q}^T\mathbf{Q} = \mathbf{I})All eigenvalues have $
Idempotent (A2=A\mathbf{A}^2 = \mathbf{A})Eigenvalues are 0 or 1
Nilpotent (Ak=0\mathbf{A}^k = \mathbf{0})All eigenvalues are 0
TriangularEigenvalues are the diagonal entries

Algebraic vs. Geometric Multiplicity

The algebraic multiplicity of an eigenvalue λ\lambda is its multiplicity as a root of the characteristic polynomial.

The geometric multiplicity is the dimension of the eigenspace dim(ker(AλI))\dim(\ker(\mathbf{A} - \lambda\mathbf{I})).

The geometric multiplicity is always \leq the algebraic multiplicity. When they are equal for all eigenvalues, the matrix is diagonalizable. When they differ, the matrix has a more complex structure (requiring Jordan normal form).

The Power Method

For large matrices, computing eigenvalues via the characteristic polynomial is impractical. The power method finds the largest eigenvalue iteratively:

  1. Start with a random vector x0\mathbf{x}_0
  2. Iterate: xk+1=AxkAxk\mathbf{x}_{k+1} = \frac{\mathbf{A}\mathbf{x}_k}{\|\mathbf{A}\mathbf{x}_k\|}
  3. The vectors converge to the eigenvector of the largest eigenvalue

The rationale: if x0=c1v1+c2v2+\mathbf{x}_0 = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots, then:

Akx0=c1λ1kv1+c2λ2kv2+\mathbf{A}^k\mathbf{x}_0 = c_1\lambda_1^k\mathbf{v}_1 + c_2\lambda_2^k\mathbf{v}_2 + \cdots

The term with the largest λ|\lambda| dominates as kk \to \infty.

Historical note: Google’s PageRank algorithm is essentially the power method applied to the web’s link matrix. The dominant eigenvector gives the steady-state distribution of a random web surfer — pages with higher eigenvector components rank higher.

import numpy as np

A = np.array([[4, 1],
              [2, 3]])

# Power method
x = np.random.randn(2)
for _ in range(20):
    x = A @ x
    x = x / np.linalg.norm(x)

eigenvalue = x @ A @ x  # Rayleigh quotient
print(f"Dominant eigenvalue: {eigenvalue:.4f}")  # ≈ 5.0
print(f"Eigenvector: {x}")

The Rayleigh Quotient

The Rayleigh quotient of a symmetric matrix A\mathbf{A} for a nonzero vector x\mathbf{x} is:

R(x)=xTAxxTxR(\mathbf{x}) = \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}

The Rayleigh quotient satisfies λminR(x)λmax\lambda_{\min} \leq R(\mathbf{x}) \leq \lambda_{\max}. It equals an eigenvalue exactly when x\mathbf{x} is the corresponding eigenvector.

The minimum and maximum of R(x)R(\mathbf{x}) over all nonzero x\mathbf{x} give the smallest and largest eigenvalues:

λmax=maxx0xTAxxTx,λmin=minx0xTAxxTx\lambda_{\max} = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}, \qquad \lambda_{\min} = \min_{\mathbf{x} \neq \mathbf{0}} \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}

Key insight: PCA can be stated as: find the direction x\mathbf{x} that maximizes the Rayleigh quotient of the covariance matrix. That direction is the first principal component — the eigenvector with the largest eigenvalue.

Condition Number

The condition number of a matrix is the ratio of its largest to smallest singular value (or, for symmetric positive definite matrices, the ratio of largest to smallest eigenvalue):

κ(A)=λmaxλmin\kappa(\mathbf{A}) = \frac{\lambda_{\max}}{\lambda_{\min}}
Condition numberInterpretation
κ1\kappa \approx 1Well-conditioned; small input changes cause small output changes
κ1\kappa \gg 1Ill-conditioned; the system is sensitive to perturbations
κ=\kappa = \inftySingular matrix

In ML, ill-conditioned matrices cause:

  • Slow convergence of gradient descent (the loss landscape is elongated)
  • Numerical instability in solving linear systems
  • Poor generalization when the condition number of XTX\mathbf{X}^T\mathbf{X} is large (multicollinearity)

Why This Matters for ML

  • PCA: The principal components are eigenvectors of the covariance matrix. Eigenvalues give the variance explained by each component.
  • PageRank: Page importance is the dominant eigenvector of the web link matrix.
  • Spectral clustering: Cluster structure is revealed by the eigenvectors of the graph Laplacian.
  • Stability of dynamical systems: A linear system xt+1=Axt\mathbf{x}_{t+1} = \mathbf{A}\mathbf{x}_t is stable if all eigenvalues satisfy λ<1|\lambda| < 1.
  • Gradient descent convergence: The condition number λmax/λmin\lambda_{\max}/\lambda_{\min} of the Hessian determines convergence speed.
  • Regularization: Adding λI\lambda\mathbf{I} to a matrix (Ridge regularization) shifts all eigenvalues by λ\lambda, improving conditioning.

Summary

  • An eigenvector v\mathbf{v} of A\mathbf{A} satisfies Av=λv\mathbf{A}\mathbf{v} = \lambda\mathbf{v} — it is only scaled, never rotated.
  • Eigenvalues are found from the characteristic equation det(AλI)=0\det(\mathbf{A} - \lambda\mathbf{I}) = 0.
  • Diagonalization A=PDP1\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1} reveals the transformation’s action as independent scaling along eigenvector axes.
  • The spectral theorem guarantees that symmetric matrices have real eigenvalues and orthogonal eigenvectors.
  • The power method finds the dominant eigenvalue iteratively — this is how PageRank works.
  • The condition number κ=λmax/λmin\kappa = \lambda_{\max}/\lambda_{\min} controls numerical stability and convergence speed.
  • PCA, spectral clustering, and stability analysis are all eigenvalue problems in disguise.
  • Next, we generalize eigendecomposition to rectangular matrices in matrix decompositions, including the powerful SVD.

References

  • Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press. math.mit.edu/~gs/linearalgebra
  • Axler, S. (2024). Linear Algebra Done Right (4th ed.). Springer. linear.axler.net
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, Chapter 2. MIT Press. deeplearningbook.org
  • Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.

Keyboard Shortcuts

Navigation
j
Next heading
k
Previous heading
n
Next article in series
p
Previous article in series
t
Scroll to top
Actions
r
Toggle reading mode
Ctrl K
Search articles
?
Toggle this help
Esc
Close overlay