More MATLAB- Fitting, Correlation Coefficients, Stability Diagrams

Basic M-Files

One can write simple programs, or `scripts,' in MATLAB , which are simply lists of MATLAB commands written in a text file (called m-file for the `.m' file extension MATLAB uses) that can be called from within the MATLAB command window. To create one of these m-files, click on the File menu in the upper left corner of the command window, then `New' and `M-file.' A text-editor window should pop up, in which you can type a sequence of commands. For example, here is an m-file which plots a stability diagram for the ventilation/$CO_2$ model we studied in class (from Edelstein-Keshet):

\begin{eqnarray*}
C_{n+1} &=& C_n + m - \alpha C_n V_n ,\\
V_{n+1} &=& \beta C_n.
\end{eqnarray*}



This model has the (positive) fixed point $\displaystyle \bar{C} = \sqrt{\frac{m}{\alpha \beta}}, \bar{V} = \sqrt{\frac{m \beta}{\alpha}}$, with accompanying Jacobian

\begin{displaymath}
{\bf J} = \left( \begin{array}{cc}
1-\sqrt{m\alpha \beta} &...
...rt{\frac{m\alpha }{\beta}} \\
\beta & 0
\end{array} \right).
\end{displaymath}

Eigenvalues are given by $\lambda_\pm = \frac12 \left[ \left( 1-\sqrt{m \alpha
\beta} \right) \pm \sqrt{\left(1-\sqrt{m \alpha \beta} \right)^2 - 4\sqrt{m
\alpha \beta}} \right].$ Notice that the eigenvalues depend on the single, combination parameter $R=\sqrt{m \alpha \beta}$, although we will not use that fact below. We will now write an m-file to plot the stability diagram (that is, the eigenvalues) associated with the fixed point, as we vary $\beta$ (taking $m=1$ and $\alpha=1$ fixed). Open a new m-file and type the following in the text of the m-file (the comments are not necessary, except, perhaps, for your own understanding)
       alpha=1; m=1;
       beta=linspace(0.02,2,100);                % set range for beta       
       lam1=zeros(1,100); lam2=zeros(1,100);     % initialize eigenvalues

       for j=1:100
                                                 % notice `...', for line continuation
          jacob=[ 1-sqrt(m*alpha*beta(j)), -sqrt(m*alpha/beta(j)); ...
                   beta(j), 0];                  % Jacobian for this beta
          eigenvalues = eig(jacob);              % Find eigenvalues
          lam1(j)=eigenvalues(1);                % First eigenvalue
          lam2(j)=eigenvalues(2);                % Second eigenvalue

       end                                       % end of loop
       plot(beta,abs(lam1),'m',beta,abs(lam2),'y');
       hold on, plot(beta,1,'g--'), hold off     % plot a dashed line at 1
To save your m-file go to the `File' menu and click `Save As.' You must get on the C:/Temp disk, or use your own floppy, then save the m-file as something, say yourname1.m (I will use jim1 below). Within the MATLAB command window, you can now type
       cd c:\temp  % this changes directories to the location of your m-file
       jim1
and the sequence of commands you put in the m-file will be executed.

 
  EXERCISE 1: Change the above m-file to plot stability for a range of $\alpha$ values from zero to 3, with $m=2$ and $\beta=\frac13$. Add labels to the horizontal and vertical axes. At what $\alpha$ does the equilibrium become unstable?  
     
 

Loops

One of the reasons to use an m-file is the need for loops which will iterate a certain activity (like predictions of our disease models). Recall that our simple disease model was given by

\begin{displaymath}
I_{n+1} = I_n + \frac{z}{100} \ I_n \ (50 - I_n),
\end{displaymath}

where $z$ is the `splat-zone' of an infectious sneeze and $I_n$ is the number of infected individuals in the $n$th iteration of the disease scenario. Here is a for loop that predicts the course of infections over 6 days, assuming a total population of 50, no recovery, and a sneeze zone of three hexes:
       inf=zeros(1,7)        % initialize infections to be a vector of zeros
       z=3;
       inf(1)=1;             % set initial condition for infections
       for n=1:6,            % begin disease loop

          inf(n+1)=inf(n)+z/100*inf(n)*(51-inf(n));

       end                   % of  for loop
       observed=[1 2 7 15 29 46 51];        % a set of observations
       days=[0:6];                          % make an 'x' variable for plot
       plot(days,observed,'r*',days,inf,'y')
       xlabel('Day'), ylabel('Infections')  % label axes
       legend('Observed','Predicted')       % put a figure legend in
This script will calculate the predicted course of the infection and compare it to the observations graphically. Also you see how to do two nice things to graphs: label them and provide a graph legend. You can actually move the legend! To see this, use your mouse to click on the legend in the figure and move it somewhere else in the plot.

 
  EXERCISE 2: Write a MATLAB m-file to analyze the data from the (simple) class experiments, using the script given above as a base. Once this is done, try some of your data with more complicated disease dynamics.  
     
 

 
  EXERCISE 3: Add a line or two to the disease m-file to calculate the $r^2$ value of the model. The formula for the $r^2$ of two vectors of data, $X$ and $Y$ is

\begin{displaymath}
r^2 = \frac{\left(n X_{\mbox{\small avg}} Y_{\mbox{\small av...
...2
\sum_{i=1}^n \left(Y_i - Y_{\mbox{\small avg}}\right)^2
}.
\end{displaymath}

You will probably want to use the MATLAB command mean; to check out what this does, type help mean. How do the correlation coefficients of the predictions for our disease models match your visual representations? Another approach is to use the MATLAB command corrcoef, which gives a matrix of correlation coefficients. Try sending two column vectors to corrcoef and see how that compares with your home-baked result.
 
     
 

Fitting Curves in MATLAB for Biology/Math 4230

Fitting Polynomials to Data

A basic problem we often face is determining what polynomial

\begin{displaymath}
y = c_0 + c_1 x + c_2 x^2 + c_3 x^3 + \cdots
\end{displaymath}

is the best representation of a set of data:

\begin{displaymath}
\left\{ (x_i,y_i ) \right\}_{i=1}^n.
\end{displaymath}

For example, in our leaky bucket example we might have a set of height of water observations and time at which those observations were made:

$i=$ 1 2 3 4 5 6 7 8 9 10
Seconds, $t_i =$ 0 10 20 30 40 50 60 70 80 90
Height, $h_i=$ 14 10.6 8.5 6.7 4.9 3.5 2.2 1. .5 .1
According to the Torricelli model, this data should match a quadratic polynomial in time

\begin{displaymath}
h = \left(\sqrt{h_0} - \frac{a\sqrt{2 g}}{2A} t \right)^2
= h_0 - \frac{a\sqrt{2 g h_0}}{A} t + \frac{ 2 g a^2}{4A^2} t^2 .
\end{displaymath}

This suggests that one way to test the Torricelli model would be to `fit' our data to the second-order polynomial:

\begin{displaymath}
p(t) = c_0 + c_1 t + c_2 t^2,
\end{displaymath}

and then see how the constants $c_0,c_1,c_2$ compare to their predicted values.

In MATLAB we can accomplish this regression with the following commands:

       height = [14 10.6 8.5 6.7 4.9 3.5 2.2 1. .5 .1];
       time = [ 0 10 20 30 40 50 60 70 80 90];
       polyfit(time,height,2)
The first argument of polyfit is a vector of independent variables at which observations were made, the second is a vector of observations, and the third is an integer which defines the order of the polynomial being fit (in this case, 2, meaning quadratic). These commands return the output
ans =

    0.0014   -0.2766   13.6509
which tells us that $c_0 = 13.6509, c_1 = -0.2766$, and $c_2=0.0014$. To plot this fitted quadratic against our data, we can use the commands
       t=linspace(0,90,91);
       hfit= 13.6509-.2766*t+.0014*t.^2;
       plot(time,height,'y*',t,hfit,'r'), xlabel('Time'), ylabel('Height')
Which looks pretty good - but remember, fits always look good.

 
  EXERCISE 4: A godless capitalist business manager intends to model the revenue $y$ to be gained by selling $x$ units of a product with the function

\begin{displaymath}
y = y_0+a x + b x^2.
\end{displaymath}

Data from the sales unit of his company for $x$ and $y$ include the points:

$x$ 1 2 3 4 5
$y$ 1.8 2.7 3.4 3.8 3.9

Find (and plot) the associated least-squares curve for the data.
 
     
 

 
  EXERCISE 5: One way to determine how well a model matches data is to plot the observed data against the predicted data. If these points fall on a line through the origin with slope 1 then the model has done perfectly. Plot your disease data against your model predictions and fit the resulting graph with a line. What is the slope and intercept, and what does this mean about your model?  
     
 

`Logarithmic' Regression

Often we want to fit data to an exponential model, for example when we are looking at rates of cooling, radioactive decay, or population growth curves for bacteria. In these circumstance we expect the observations, $y$, to be related to the independent variable, $x$, by

\begin{displaymath}
y = A e^{B x},
\end{displaymath}

where $A$ and $B$ are the unknown parameters. This becomes a linear regression problem if we take logarithms of both sides,

\begin{displaymath}
\ln(y) = \ln(A) + B x.
\end{displaymath}

Consequently, if we `fit' the logarithm of the data to the independent variable, the coefficients returned are $\ln(A)$ and $B$.

Let's do logarithmic regression on the leaky bucket data above:

       logh = log(height);
       polyfit(time,logh,1)
which returns the output
ans =

   -0.0486    3.1558
which tells us that if $h = h_0 e^{-a t}$ then $ a=0.0486$ and $\ln(h_0) =
3.1558$. Let's plot this against our data and the quadratic regression to see how it looks:
       logfit=exp(3.1558)*exp(-.0486*t);
       plot(time,height,'y*',t,hfit,'r',t,logfit,'m')
       xlabel('Time'), ylabel('Height'), legend('Observations','Quadratic','Exponential')
Yuck! This looks horrible! What went wrong?

 
  EXERCISE 6: How can you re-do the exponential fitting so that there is no need to fit for $\ln (A) = \ln (h_0)$? Does this improve the previous result? Hint: What is the behavior $\ln(h) -\ln(h_0)$ if $h = h_0 e^{-a t}$?  
     
 

General and Multi-Linear Regression

In Lay's Linear Algebra and Its Applications (pp. 414-420) is discussed the linear algebra related to least-squares fitting data to arbitrary linear models

\begin{displaymath}
y = c_1 f_1(x) + c_2 f_2(x) +\cdots.
\end{displaymath}

In general one needs to create a design matrix, X, which is composed of the functions, $f_j$ evaluated at the observation points, $x_i$, and then solve the linear system

\begin{displaymath}
{\bf X}^{T}{\bf X} \vec{c} = {\bf X}^{T} \vec{y}
\end{displaymath}

where $\vec{c}$ is the vector of parameters, $c_1,c_2,c_3, cdots$ and $\vec{y}$ is the vector of observations. The design matrix is given by

\begin{displaymath}
{\bf X} = \left(
\begin{array}{cccc}
f_1(x_1) & f_2(x_1) & ...
...
f_1(x_n) & f_2(x_n) & f_3(x_n) & \cdots
\end{array} \right)
\end{displaymath}

So, for example, if I wanted to fit the fit the quadratic model

\begin{displaymath}
c_0 + c_1 t + c_2 t^2
\end{displaymath}

to the bucket data above the design matrix would be

\begin{displaymath}
{\bf X} = \left(
\begin{array}{ccc}
1 & 0 & 0^2 \\
1 & 10 ...
...2 & 50^2 & 60^2 & 70^2 & 80^2 & 90^2 \\
\end{array} \right) .
\end{displaymath}

Let's do the regression in MATLAB . Notice that the observations, y, must be written as a column vector.
       y = [14; 10.6; 8.5; 6.7; 4.9; 3.5; 2.2; 1.; .5; .1];
       X = [ 1 0  0^2;
             1 10  10^2;
             1 20  20^2;
             1 30  30^2;
             1 40  40^2;
             1 50  50^2;
             1 60  60^2;
             1 70  70^2;
             1 80  80^2;
             1 90  90^2 ];
       XT=X'       % This is the transpose in Matlab
       A = XT*X
       c = inv(A)*XT*y
The output vector, c, should contain the coefficients of the quadratic polynomial; compare them to your polyfit results above to check your work.

 
  EXERCISE 7: A healthy child's systolic blood pressure $p$ (in millimeters of Hg) and weight $w$ (in pounds) are approximately related by the equation

\begin{displaymath}
a + b \ln (w) = p.
\end{displaymath}

Use the following experimental data to `determine parameters which give the best `fit:'

$w$ 44 61 81 113 131
$\ln(w)$ 3.78 4.11 4.41 4.73 4.88
$p$ 91 98 103 110 112

Estimate the systolic blood pressure of a healthy child weighing 100 pounds. Plot the data and the curve, indicating where the estimate for the 100 pound child lies. Does this estimate seem consistent with the data?

 
     
 



James Powell
2002-01-29