Eigenvalues and Eigenvectors: The DNA of a Matrix

Linear Algebra Series 7 / 13

Introduction

Most vectors change direction when multiplied by a matrix. But some special vectors only get scaled — they point in the same direction (or flip) but never rotate. These are eigenvectors, and the scaling factors are eigenvalues. Together, they reveal the intrinsic structure of a linear transformation.

Eigenvalues and eigenvectors are not just an abstract curiosity. They are the mathematical engine behind PCA, Google’s PageRank, spectral clustering, stability analysis in dynamical systems, and the spectral theory of graphs. This article builds on inner products and orthogonality and leads directly into matrix decompositions.

Definition

Let $\mathbf{A}$ be an $n \times n$ matrix. A nonzero vector $\mathbf{v}$ is an eigenvector of $\mathbf{A}$ if:

\mathbf{A}\mathbf{v} = \lambda \mathbf{v}

for some scalar $\lambda$ . The scalar $\lambda$ is the corresponding eigenvalue.

In words: applying $\mathbf{A}$ to $\mathbf{v}$ produces the same vector scaled by $\lambda$ . The eigenvector defines a direction that the transformation preserves.

$\lambda$	Effect on eigenvector
$\lambda > 1$	Stretched along the eigenvector direction
$0 < \lambda < 1$	Compressed
$\lambda = 1$	Unchanged
$\lambda = 0$	Collapsed to zero (eigenvector is in the null space)
$\lambda < 0$	Flipped and scaled

Geometric interpretation: Imagine the matrix $\mathbf{A}$ as a transformation that warps space. The eigenvectors are the directions along which the warping is pure scaling — no rotation, no shearing. The eigenvalues tell you how much scaling happens along each direction.

Finding Eigenvalues: The Characteristic Equation

Rearranging $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$ :

(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}

For a nonzero solution $\mathbf{v}$ to exist, the matrix $(\mathbf{A} - \lambda\mathbf{I})$ must be singular:

\det(\mathbf{A} - \lambda\mathbf{I}) = 0

This is the characteristic equation. The left side is a polynomial of degree $n$ in $\lambda$ , called the characteristic polynomial.

For a $2 \times 2$ matrix $\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$ :

\det(\mathbf{A} - \lambda\mathbf{I}) = \lambda^2 - (a+d)\lambda + (ad - bc) = \lambda^2 - \text{tr}(\mathbf{A})\lambda + \det(\mathbf{A})

The eigenvalues satisfy:

\lambda_1 + \lambda_2 = \text{tr}(\mathbf{A}), \qquad \lambda_1 \cdot \lambda_2 = \det(\mathbf{A})

These relationships generalize: for any $n \times n$ matrix, the sum of eigenvalues equals the trace and the product equals the determinant.

Finding Eigenvectors

Once you have an eigenvalue $\lambda$ , find the corresponding eigenvector(s) by solving:

(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}

The set of all solutions (including $\mathbf{0}$ ) is the eigenspace for $\lambda$ — it is the null space of $(\mathbf{A} - \lambda\mathbf{I})$ .

Worked Example

Find the eigenvalues and eigenvectors of:

\mathbf{A} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}

Step 1: Characteristic equation

\begin{aligned} \det(\mathbf{A} - \lambda\mathbf{I}) &= \det\begin{bmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{bmatrix} \\[6pt] &= (4-\lambda)(3-\lambda) - 2 \\[6pt] &= \lambda^2 - 7\lambda + 10 \\[6pt] &= (\lambda - 5)(\lambda - 2) = 0 \end{aligned}

Eigenvalues: $\lambda_1 = 5$ , $\lambda_2 = 2$ .

Check: $\lambda_1 + \lambda_2 = 7 = \text{tr}(\mathbf{A})$ and $\lambda_1 \cdot \lambda_2 = 10 = \det(\mathbf{A})$ .

Step 2: Eigenvectors

For $\lambda_1 = 5$ :

(\mathbf{A} - 5\mathbf{I})\mathbf{v} = \begin{bmatrix} -1 & 1 \\ 2 & -2 \end{bmatrix}\mathbf{v} = \mathbf{0}

This gives $-v_1 + v_2 = 0$ , so $\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$ (or any scalar multiple).

For $\lambda_2 = 2$ :

(\mathbf{A} - 2\mathbf{I})\mathbf{v} = \begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix}\mathbf{v} = \mathbf{0}

This gives $2v_1 + v_2 = 0$ , so $\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}$ .

import numpy as np

A = np.array([[4, 1],
              [2, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")   # [5. 2.]
print(f"Eigenvectors:\n{eigenvectors}")
# Each column is an eigenvector

Diagonalization

If an $n \times n$ matrix $\mathbf{A}$ has $n$ linearly independent eigenvectors, it can be diagonalized:

\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}

where $\mathbf{P}$ is the matrix whose columns are eigenvectors and $\mathbf{D}$ is the diagonal matrix of eigenvalues:

\mathbf{P} = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix}, \qquad \mathbf{D} = \begin{bmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}

Why Diagonalization Matters

Diagonalization transforms a complex transformation into independent scaling along each eigenvector axis. It makes many computations trivial:

Matrix powers:

\mathbf{A}^k = \mathbf{P}\mathbf{D}^k\mathbf{P}^{-1} = \mathbf{P}\begin{bmatrix} \lambda_1^k & & \\ & \lambda_2^k & \\ & & \ddots \end{bmatrix}\mathbf{P}^{-1}

Matrix exponential (used in differential equations and continuous-time models):

e^{\mathbf{A}t} = \mathbf{P}\begin{bmatrix} e^{\lambda_1 t} & & \\ & e^{\lambda_2 t} & \\ & & \ddots \end{bmatrix}\mathbf{P}^{-1}

Key insight: Diagonalization is a change of basis to the “natural” coordinate system of the transformation — the one where it acts as pure scaling.

The Spectral Theorem

The spectral theorem states that every real symmetric matrix $\mathbf{A} = \mathbf{A}^T$ can be diagonalized by an orthogonal matrix:

\mathbf{A} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T

where:

$\mathbf{Q}$ is orthogonal ( $\mathbf{Q}^T\mathbf{Q} = \mathbf{I}$ ), with eigenvectors as columns
$\boldsymbol{\Lambda}$ is diagonal with real eigenvalues
All eigenvalues are real (even though eigenvalues of general matrices can be complex)
Eigenvectors corresponding to distinct eigenvalues are orthogonal

This is the most important theorem in applied linear algebra. It guarantees that symmetric matrices — which include covariance matrices, Gram matrices, and Hessians — always have a clean, orthogonal decomposition.

Key insight: PCA is a direct application of the spectral theorem. The covariance matrix $\boldsymbol{\Sigma}$ is symmetric, so it decomposes as $\boldsymbol{\Sigma} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T$ . The eigenvectors (columns of $\mathbf{Q}$ ) are the principal components, and the eigenvalues (diagonal of $\boldsymbol{\Lambda}$ ) are the variances along those components.

Eigenvalues of Special Matrices

Matrix Type	Eigenvalue Property
Symmetric ( $\mathbf{A} = \mathbf{A}^T$ )	All eigenvalues are real
Positive definite	All eigenvalues are positive
Positive semi-definite	All eigenvalues are $\geq 0$
Orthogonal ( $\mathbf{Q}^T\mathbf{Q} = \mathbf{I}$ )	All eigenvalues have $
Idempotent ( $\mathbf{A}^2 = \mathbf{A}$ )	Eigenvalues are 0 or 1
Nilpotent ( $\mathbf{A}^k = \mathbf{0}$ )	All eigenvalues are 0
Triangular	Eigenvalues are the diagonal entries

Algebraic vs. Geometric Multiplicity

The algebraic multiplicity of an eigenvalue $\lambda$ is its multiplicity as a root of the characteristic polynomial.

The geometric multiplicity is the dimension of the eigenspace $\dim(\ker(\mathbf{A} - \lambda\mathbf{I}))$ .

The geometric multiplicity is always $\leq$ the algebraic multiplicity. When they are equal for all eigenvalues, the matrix is diagonalizable. When they differ, the matrix has a more complex structure (requiring Jordan normal form).

The Power Method

For large matrices, computing eigenvalues via the characteristic polynomial is impractical. The power method finds the largest eigenvalue iteratively:

Start with a random vector $\mathbf{x}_0$
Iterate: $\mathbf{x}_{k+1} = \frac{\mathbf{A}\mathbf{x}_k}{\|\mathbf{A}\mathbf{x}_k\|}$
The vectors converge to the eigenvector of the largest eigenvalue

The rationale: if $\mathbf{x}_0 = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots$ , then:

\mathbf{A}^k\mathbf{x}_0 = c_1\lambda_1^k\mathbf{v}_1 + c_2\lambda_2^k\mathbf{v}_2 + \cdots

The term with the largest $|\lambda|$ dominates as $k \to \infty$ .

Historical note: Google’s PageRank algorithm is essentially the power method applied to the web’s link matrix. The dominant eigenvector gives the steady-state distribution of a random web surfer — pages with higher eigenvector components rank higher.

import numpy as np

A = np.array([[4, 1],
              [2, 3]])

# Power method
x = np.random.randn(2)
for _ in range(20):
    x = A @ x
    x = x / np.linalg.norm(x)

eigenvalue = x @ A @ x  # Rayleigh quotient
print(f"Dominant eigenvalue: {eigenvalue:.4f}")  # ≈ 5.0
print(f"Eigenvector: {x}")

The Rayleigh Quotient

The Rayleigh quotient of a symmetric matrix $\mathbf{A}$ for a nonzero vector $\mathbf{x}$ is:

R(\mathbf{x}) = \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}

The Rayleigh quotient satisfies $\lambda_{\min} \leq R(\mathbf{x}) \leq \lambda_{\max}$ . It equals an eigenvalue exactly when $\mathbf{x}$ is the corresponding eigenvector.

The minimum and maximum of $R(\mathbf{x})$ over all nonzero $\mathbf{x}$ give the smallest and largest eigenvalues:

\lambda_{\max} = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}, \qquad \lambda_{\min} = \min_{\mathbf{x} \neq \mathbf{0}} \frac{\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{x}^T\mathbf{x}}

Key insight: PCA can be stated as: find the direction $\mathbf{x}$ that maximizes the Rayleigh quotient of the covariance matrix. That direction is the first principal component — the eigenvector with the largest eigenvalue.

Condition Number

The condition number of a matrix is the ratio of its largest to smallest singular value (or, for symmetric positive definite matrices, the ratio of largest to smallest eigenvalue):

\kappa(\mathbf{A}) = \frac{\lambda_{\max}}{\lambda_{\min}}

Condition number	Interpretation
$\kappa \approx 1$	Well-conditioned; small input changes cause small output changes
$\kappa \gg 1$	Ill-conditioned; the system is sensitive to perturbations
$\kappa = \infty$	Singular matrix

In ML, ill-conditioned matrices cause:

Slow convergence of gradient descent (the loss landscape is elongated)
Numerical instability in solving linear systems
Poor generalization when the condition number of $\mathbf{X}^T\mathbf{X}$ is large (multicollinearity)

Why This Matters for ML

PCA: The principal components are eigenvectors of the covariance matrix. Eigenvalues give the variance explained by each component.
PageRank: Page importance is the dominant eigenvector of the web link matrix.
Spectral clustering: Cluster structure is revealed by the eigenvectors of the graph Laplacian.
Stability of dynamical systems: A linear system $\mathbf{x}_{t+1} = \mathbf{A}\mathbf{x}_t$ is stable if all eigenvalues satisfy $|\lambda| < 1$ .
Gradient descent convergence: The condition number $\lambda_{\max}/\lambda_{\min}$ of the Hessian determines convergence speed.
Regularization: Adding $\lambda\mathbf{I}$ to a matrix (Ridge regularization) shifts all eigenvalues by $\lambda$ , improving conditioning.

Summary

An eigenvector $\mathbf{v}$ of $\mathbf{A}$ satisfies $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$ — it is only scaled, never rotated.
Eigenvalues are found from the characteristic equation $\det(\mathbf{A} - \lambda\mathbf{I}) = 0$ .
Diagonalization $\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}$ reveals the transformation’s action as independent scaling along eigenvector axes.
The spectral theorem guarantees that symmetric matrices have real eigenvalues and orthogonal eigenvectors.
The power method finds the dominant eigenvalue iteratively — this is how PageRank works.
The condition number $\kappa = \lambda_{\max}/\lambda_{\min}$ controls numerical stability and convergence speed.
PCA, spectral clustering, and stability analysis are all eigenvalue problems in disguise.
Next, we generalize eigendecomposition to rectangular matrices in matrix decompositions, including the powerful SVD.

References

Strang, G. (2016). Introduction to Linear Algebra (5th ed.). Wellesley-Cambridge Press. math.mit.edu/~gs/linearalgebra
Axler, S. (2024). Linear Algebra Done Right (4th ed.). Springer. linear.axler.net
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning, Chapter 2. MIT Press. deeplearningbook.org
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.