Least Squares

Normal Equations

\[A^TAx=A^Tb\]

Minimizes \(\|Ax-b\|^2\). Solution: \(\hat{x}=(A^TA)^{{-1}}A^Tb\).

Examples

Example 1. Best fit line through (0,1),(1,2),(2,2).

Solution. Normal equations give slope≈0.5, intercept≈1.17.

In Depth

The least-squares problem minimizes \(\|A\mathbf{x}-\mathbf{b}\|^2\) over all \(\mathbf{x}\). When \(A\mathbf{x}=\mathbf{b}\) is overdetermined (more equations than unknowns), the least-squares solution \(\hat{\mathbf{x}}=(A^TA)^{-1}A^T\mathbf{b}\) minimizes the sum of squared residuals.

The normal equations \(A^TA\hat{\mathbf{x}}=A^T\mathbf{b}\) characterize the least-squares solution. Geometrically, \(A\hat{\mathbf{x}}\) is the orthogonal projection of \(\mathbf{b}\) onto the column space of \(A\). The residual \(\mathbf{b}-A\hat{\mathbf{x}}\) is orthogonal to every column of \(A\).

When \(A^TA\) is ill-conditioned, direct solution of the normal equations is numerically unstable. The QR decomposition \(A=QR\) gives a stable algorithm: \(\hat{\mathbf{x}}=R^{-1}Q^T\mathbf{b}\). The SVD-based pseudoinverse \(A^+=V\Sigma^+U^T\) handles rank-deficient cases.

Regularization adds a penalty term: ridge regression minimizes \(\|A\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2\), giving \(\hat{\mathbf{x}}=(A^TA+\lambda I)^{-1}A^T\mathbf{b}\). This shrinks coefficients toward zero and stabilizes ill-conditioned problems. LASSO uses an \(\ell^1\) penalty, promoting sparsity.

Least squares is the foundation of linear regression in statistics, curve fitting in engineering, and parameter estimation in science. The Gauss–Markov theorem states that the least-squares estimator is the best linear unbiased estimator (BLUE) when errors are uncorrelated with equal variance.

Key Properties & Applications

The pseudoinverse \(A^+=V\Sigma^+U^T\) (where \(\Sigma^+\) inverts nonzero singular values) gives the minimum-norm least-squares solution: \(\hat{\mathbf{x}}=A^+\mathbf{b}\) minimizes \(\|\mathbf{x}\|\) among all minimizers of \(\|A\mathbf{x}-\mathbf{b}\|\).

Weighted least squares minimizes \(\|W^{1/2}(A\mathbf{x}-\mathbf{b})\|^2\) where \(W\) is a diagonal weight matrix. It is used when observations have different reliabilities. The solution is \(\hat{\mathbf{x}}=(A^TWA)^{-1}A^TW\mathbf{b}\).

Total least squares (TLS) minimizes errors in both \(A\) and \(\mathbf{b}\), appropriate when both are measured with noise. The TLS solution is given by the right singular vector of \([A|\mathbf{b}]\) corresponding to the smallest singular value.

Further Reading & Context

The study of least squares connects to many areas of mathematics and its applications. Understanding the foundational definitions and theorems provides the basis for advanced work in analysis, algebra, and applied mathematics.

Historical development: most mathematical concepts evolved over centuries, with contributions from mathematicians across many cultures. The modern axiomatic treatment provides rigor, while computational tools enable practical application.

In modern mathematics, this topic appears in graduate courses and research across pure and applied mathematics. Connections to computer science, physics, and engineering make it a versatile and important area of study. Mastery of the core results and techniques opens doors to research in number theory, analysis, geometry, and beyond.

Recommended next steps: work through the standard theorems with full proofs, explore the connections to related topics listed above, and practice with a variety of problems ranging from computational exercises to theoretical proofs. The interplay between different areas of mathematics is one of the subject's greatest rewards.

Deep Dive: Least Squares

This lesson extends core ideas for least squares with rigorous reasoning, edge-case checks, and application framing in linear algebra.

Practice Set

Practice. Derive one main result on this page and validate with a numeric or geometric check.

Goal. Confirm assumptions, transformation steps, and final interpretation.

References & Editorial Notes

  • Stewart, Calculus.
  • Strang, Introduction to Linear Algebra.
  • Apostol, Mathematical Analysis.

Last editorial review: 2026-04-14.