%\input math_mac.tex
%\setup {\sl Appendix IV: Derivation of the Least Squares Fit}
%{\sl

%\subsect {(C.2)} {Derivation of the Least Squares Fit}
%\noindent 

Following is a simple derivation of the least squares fit.
%\medskip

%\noindent 
Suppose the relationship between the two experimental parameters
being studied is
\[
y = f(x)
\]
where $x$ is the independent parameter which is varied, and
$y$ is the dependent parameter.
If $f(x)$ is a polynomial function, or can be approximated
by a polynomial, then the least squares method is a
\emph{linear} one, and it will almost always give reliable
answers.
If $f(x)$ cannot be expressed as a polynomial, but consists
of transcendental functions, the least squares method is
non-linear, and may or may not work reliably.
In some cases, a change of variables may result in a polynomial,
as in the exponential example above.
A function like
\[
y = a + \frac b x + \frac c {x^2}
\]
is not a polynomial in $x$, but it is a polynomial in
the variable $z = 1/x$.
%\medskip
%\par 

Suppose the functional relationship between $x$ and $y$
is a polynomial of degree $\ell$:
\begin{equation}
y = a_0 + a_1 x + a_2 x^2  \ldots  a_\ell x^\ell 
\label{eq:lsfone}
\end{equation}
or
\begin{equation}
y = \sum_{j=0}^\ell a_j x^j 
\label{eq:lsftwo}
\end{equation}
and we have a set of $N$ data points ${x_i,y_i}$ obtained
by experiment.
The goal is to find the values of the $\ell+1$ parameters
$a_0, a_1 \ldots a_\ell$ which will give the best fit of
Equation~{\ref{eq:lsfone}} to our data points.
The first piece of information to note is that
\begin{equation}
N \geq \ell+1 
\label{eq:lsfthr}
\end{equation}
or else we will not be able to make a unique determination.
For example, if $\ell=1$, we need at least two data points to
find the equation of the straight line.
In order to make any meaningful statistical statements, however,
we will need even more than $\ell+1$ points, as we shall
see later.
A good rule of thumb: if we wish to fit our data with
a polynomial of degree $\ell$ in a 95\% confidence interval,
we should choose N such that
\begin{equation}
N - (\ell+1) \geq 10 
\label{eq:lsffou}
\end{equation}
The idea behind the linear least squares method is to
\emph{minimize} the sum
\begin{equation}
S = \sum_{i=1}^N \left(y_i - \sum_{j=0}^\ell a_j x_i^j \right)^2
\label{eq:lsffiv}
\end{equation}
$S$ will be a minimum if
\begin{equation}
\delx S {a_k} = 0 \qquad k = 0, 1, 2 \ldots \ell 
\label{eq:lsfsix}
\end{equation}
The result will be $\ell+1$ linear equations in $\ell+1$ unknowns:
\begin{equation}
\sum_{j=0}^\ell a_j \left( \sum_{i=1}^N x_i^{j+k} \right) =
\sum_{i=1}^N x_i^k y_i \qquad k=0,1 \ldots \ell 
\label{eq:lsfsev}
\end{equation}
which can be solved by standard matrix techniques for
the unknown coefficients $a_0, a_1 \ldots a_\ell$.
As an example, let us consider the case where $\ell=1$, or
\[
y = m x + b
\]
In this case,
\[
S = \sum_{i=1}^N \left( y_i - ( m x_i + b ) \right) ^2
\]
Expanding Equation~{\ref{eq:lsfsev}}, we have
\begin{eqnarray}
b (N) +
m \left( \sum_{i=1}^N x_i \right) &= \sum_{i=1}^N y_i \\
%\noalign{\smallskip}
b \left( \sum_{i=1}^N x_i \right) +
m \left( \sum_{i=1}^N x_i^2 \right) &= \sum_{i=1}^N x_i y_i 
\end{eqnarray}
Then the intercept $b$ and the slope $m$ can be found
from Cramer's rule
\begin{equation}
b = {\frac {\left(\sum y_i\right)\left(\sum x_i^2\right) -
             \left(\sum x_i\right)\left(\sum x_i y_i \right)}
            {N\left(\sum x_i^2\right) -
              \left( \sum x_i \right)^2}  } 
\end{equation}
and
\begin{equation}
m = {\frac {N\left( \sum x_i y_i \right) -
              \left( \sum x_i \right)\left(\sum y_i \right)}
            {N\left( \sum x_i^2 \right) -
              \left( \sum x_i \right)^2}  } 
\end{equation}


%}
%\vfill\eject
%\vfill\eject\end