org.apache.commons.math3.stat.regression
Class MillerUpdatingRegression

java.lang.Object
  extended by org.apache.commons.math3.stat.regression.MillerUpdatingRegression
All Implemented Interfaces:
UpdatingMultipleLinearRegression

public class MillerUpdatingRegression
extends java.lang.Object
implements UpdatingMultipleLinearRegression

This class is a concrete implementation of the UpdatingMultipleLinearRegression interface.

The algorithm is described in:

 Algorithm AS 274: Least Squares Routines to Supplement Those of Gentleman
 Author(s): Alan J. Miller
 Source: Journal of the Royal Statistical Society.
 Series C (Applied Statistics), Vol. 41, No. 2
 (1992), pp. 458-478
 Published by: Blackwell Publishing for the Royal Statistical Society
 Stable URL: http://www.jstor.org/stable/2347583 

This method for multiple regression forms the solution to the OLS problem by updating the QR decomposition as described by Gentleman.

Since:
3.0
Version:
$Id: MillerUpdatingRegression.java 1392358 2012-10-01 14:41:55Z psteitz $

Field Summary
private  double[] d
          diagonals of cross products matrix
private  double epsilon
          zero tolerance
private  boolean hasIntercept
          boolean flag whether a regression constant is added
private  boolean[] lindep
          flags for variables with linear dependency problems
private  long nobs
          number of observations entered
private  int nvars
          number of variables in regression
private  double[] r
          the off diagonal portion of the R matrix
private  double[] rhs
          the elements of the R`Y
private  double[] rss
          residual sum of squares for all nested regressions
private  boolean rss_set
          has rss been called?
private  double sserr
          sum of squared errors of largest regression
private  double sumsqy
          summation of squared Y values
private  double sumy
          summation of Y variable
private  double[] tol
          the tolerance for each of the variables
private  boolean tol_set
          has the tolerance setting method been called
private  int[] vorder
          order of the regressors
private  double[] work_sing
          workspace for singularity method
private  double[] work_tolset
          scratch space for tolerance calc
private  double[] x_sing
          singular x values
 
Constructor Summary
private MillerUpdatingRegression()
          Set the default constructor to private access to prevent inadvertent instantiation
  MillerUpdatingRegression(int numberOfVariables, boolean includeConstant)
          Primary constructor for the MillerUpdatingRegression.
  MillerUpdatingRegression(int numberOfVariables, boolean includeConstant, double errorTolerance)
          This is the augmented constructor for the MillerUpdatingRegression class.
 
Method Summary
 void addObservation(double[] x, double y)
          Adds an observation to the regression model.
 void addObservations(double[][] x, double[] y)
          Adds multiple observations to the model.
 void clear()
          As the name suggests, clear wipes the internals and reorders everything in the canonical order.
private  double[] cov(int nreq)
          Calculates the cov matrix assuming only the first nreq variables are included in the calculation.
 double getDiagonalOfHatMatrix(double[] row_data)
          Gets the diagonal of the Hat matrix also known as the leverage matrix.
 long getN()
          Gets the number of observations added to the regression model.
 int[] getOrderOfRegressors()
          Gets the order of the regressors, useful if some type of reordering has been called.
 double[] getPartialCorrelations(int in)
          In the original algorithm only the partial correlations of the regressors is returned to the user.
 boolean hasIntercept()
          A getter method which determines whether a constant is included.
private  void include(double[] x, double wi, double yi)
          The include method is where the QR decomposition occurs.
private  void inverse(double[] rinv, int nreq)
          This internal method calculates the inverse of the upper-triangular portion of the R matrix.
private  double[] regcf(int nreq)
          The regcf method conducts the linear regression and extracts the parameter vector.
 RegressionResults regress()
          Conducts a regression on the data in the model, using all regressors.
 RegressionResults regress(int numberOfRegressors)
          Conducts a regression on the data in the model, using a subset of regressors.
 RegressionResults regress(int[] variablesToInclude)
          Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.
private  int reorderRegressors(int[] list, int pos1)
          ALGORITHM AS274 APPL.
private  void singcheck()
          The method which checks for singularities and then eliminates the offending columns.
private  double smartAdd(double a, double b)
          Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.
private  void ss()
          Calculates the sum of squared errors for the full regression and all subsets in the following manner:
private  void tolset()
          This sets up tolerances for singularity testing.
private  void vmove(int from, int to)
          ALGORITHM AS274 APPL.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nvars

private final int nvars
number of variables in regression


d

private final double[] d
diagonals of cross products matrix


rhs

private final double[] rhs
the elements of the R`Y


r

private final double[] r
the off diagonal portion of the R matrix


tol

private final double[] tol
the tolerance for each of the variables


rss

private final double[] rss
residual sum of squares for all nested regressions


vorder

private final int[] vorder
order of the regressors


work_tolset

private final double[] work_tolset
scratch space for tolerance calc


nobs

private long nobs
number of observations entered


sserr

private double sserr
sum of squared errors of largest regression


rss_set

private boolean rss_set
has rss been called?


tol_set

private boolean tol_set
has the tolerance setting method been called


lindep

private final boolean[] lindep
flags for variables with linear dependency problems


x_sing

private final double[] x_sing
singular x values


work_sing

private final double[] work_sing
workspace for singularity method


sumy

private double sumy
summation of Y variable


sumsqy

private double sumsqy
summation of squared Y values


hasIntercept

private boolean hasIntercept
boolean flag whether a regression constant is added


epsilon

private final double epsilon
zero tolerance

Constructor Detail

MillerUpdatingRegression

private MillerUpdatingRegression()
Set the default constructor to private access to prevent inadvertent instantiation


MillerUpdatingRegression

public MillerUpdatingRegression(int numberOfVariables,
                                boolean includeConstant,
                                double errorTolerance)
                         throws ModelSpecificationException
This is the augmented constructor for the MillerUpdatingRegression class.

Parameters:
numberOfVariables - number of regressors to expect, not including constant
includeConstant - include a constant automatically
errorTolerance - zero tolerance, how machine zero is determined
Throws:
ModelSpecificationException - if numberOfVariables is less than 1

MillerUpdatingRegression

public MillerUpdatingRegression(int numberOfVariables,
                                boolean includeConstant)
                         throws ModelSpecificationException
Primary constructor for the MillerUpdatingRegression.

Parameters:
numberOfVariables - maximum number of potential regressors
includeConstant - include a constant automatically
Throws:
ModelSpecificationException - if numberOfVariables is less than 1
Method Detail

hasIntercept

public boolean hasIntercept()
A getter method which determines whether a constant is included.

Specified by:
hasIntercept in interface UpdatingMultipleLinearRegression
Returns:
true regression has an intercept, false no intercept

getN

public long getN()
Gets the number of observations added to the regression model.

Specified by:
getN in interface UpdatingMultipleLinearRegression
Returns:
number of observations

addObservation

public void addObservation(double[] x,
                           double y)
                    throws ModelSpecificationException
Adds an observation to the regression model.

Specified by:
addObservation in interface UpdatingMultipleLinearRegression
Parameters:
x - the array with regressor values
y - the value of dependent variable given these regressors
Throws:
ModelSpecificationException - if the length of x does not equal the number of independent variables in the model

addObservations

public void addObservations(double[][] x,
                            double[] y)
                     throws ModelSpecificationException
Adds multiple observations to the model.

Specified by:
addObservations in interface UpdatingMultipleLinearRegression
Parameters:
x - observations on the regressors
y - observations on the regressand
Throws:
ModelSpecificationException - if x is not rectangular, does not match the length of y or does not contain sufficient data to estimate the model

include

private void include(double[] x,
                     double wi,
                     double yi)
The include method is where the QR decomposition occurs. This statement forms all intermediate data which will be used for all derivative measures. According to the miller paper, note that in the original implementation the x vector is overwritten. In this implementation, the include method is passed a copy of the original data vector so that there is no contamination of the data. Additionally, this method differs slightly from Gentleman's method, in that the assumption is of dense design matrices, there is some advantage in using the original gentleman algorithm on sparse matrices.

Parameters:
x - observations on the regressors
wi - weight of the this observation (-1,1)
yi - observation on the regressand

smartAdd

private double smartAdd(double a,
                        double b)
Adds to number a and b such that the contamination due to numerical smallness of one addend does not corrupt the sum.

Parameters:
a - - an addend
b - - an addend
Returns:
the sum of the a and b

clear

public void clear()
As the name suggests, clear wipes the internals and reorders everything in the canonical order.

Specified by:
clear in interface UpdatingMultipleLinearRegression

tolset

private void tolset()
This sets up tolerances for singularity testing.


regcf

private double[] regcf(int nreq)
                throws ModelSpecificationException
The regcf method conducts the linear regression and extracts the parameter vector. Notice that the algorithm can do subset regression with no alteration.

Parameters:
nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
Returns:
an array with the estimated slope coefficients
Throws:
ModelSpecificationException - if nreq is less than 1 or greater than the number of independent variables

singcheck

private void singcheck()
The method which checks for singularities and then eliminates the offending columns.


ss

private void ss()
Calculates the sum of squared errors for the full regression and all subsets in the following manner:
 rss[] ={
 ResidualSumOfSquares_allNvars,
 ResidualSumOfSquares_FirstNvars-1,
 ResidualSumOfSquares_FirstNvars-2,
 ..., ResidualSumOfSquares_FirstVariable} 


cov

private double[] cov(int nreq)
Calculates the cov matrix assuming only the first nreq variables are included in the calculation. The returned array contains a symmetric matrix stored in lower triangular form. The matrix will have ( nreq + 1 ) * nreq / 2 elements. For illustration
 cov =
 {
  cov_00,
  cov_10, cov_11,
  cov_20, cov_21, cov22,
  ...
 } 

Parameters:
nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)
Returns:
an array with the variance covariance of the included regressors in lower triangular form

inverse

private void inverse(double[] rinv,
                     int nreq)
This internal method calculates the inverse of the upper-triangular portion of the R matrix.

Parameters:
rinv - the storage for the inverse of r
nreq - how many of the regressors to include (either in canonical order, or in the current reordered state)

getPartialCorrelations

public double[] getPartialCorrelations(int in)
In the original algorithm only the partial correlations of the regressors is returned to the user. In this implementation, we have
 corr =
 {
   corrxx - lower triangular
   corrxy - bottom row of the matrix
 }
 Replaces subroutines PCORR and COR of:
 ALGORITHM AS274  APPL. STATIST. (1992) VOL.41, NO. 2 

Calculate partial correlations after the variables in rows 1, 2, ..., IN have been forced into the regression. If IN = 1, and the first row of R represents a constant in the model, then the usual simple correlations are returned.

If IN = 0, the value returned in array CORMAT for the correlation of variables Xi & Xj is:

 sum ( Xi.Xj ) / Sqrt ( sum (Xi^2) . sum (Xj^2) )

On return, array CORMAT contains the upper triangle of the matrix of partial correlations stored by rows, excluding the 1's on the diagonal. e.g. if IN = 2, the consecutive elements returned are: (3,4) (3,5) ... (3,ncol), (4,5) (4,6) ... (4,ncol), etc. Array YCORR stores the partial correlations with the Y-variable starting with YCORR(IN+1) = partial correlation with the variable in position (IN+1).

Parameters:
in - how many of the regressors to include (either in canonical order, or in the current reordered state)
Returns:
an array with the partial correlations of the remainder of regressors with each other and the regressand, in lower triangular form

vmove

private void vmove(int from,
                   int to)
ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2. Move variable from position FROM to position TO in an orthogonal reduction produced by AS75.1.

Parameters:
from - initial position
to - destination

reorderRegressors

private int reorderRegressors(int[] list,
                              int pos1)
ALGORITHM AS274 APPL. STATIST. (1992) VOL.41, NO. 2

Re-order the variables in an orthogonal reduction produced by AS75.1 so that the N variables in LIST start at position POS1, though will not necessarily be in the same order as in LIST. Any variables in VORDER before position POS1 are not moved. Auxiliary routine called: VMOVE.

This internal method reorders the regressors.

Parameters:
list - the regressors to move
pos1 - where the list will be placed
Returns:
-1 error, 0 everything ok

getDiagonalOfHatMatrix

public double getDiagonalOfHatMatrix(double[] row_data)
Gets the diagonal of the Hat matrix also known as the leverage matrix.

Parameters:
row_data - returns the diagonal of the hat matrix for this observation
Returns:
the diagonal element of the hatmatrix

getOrderOfRegressors

public int[] getOrderOfRegressors()
Gets the order of the regressors, useful if some type of reordering has been called. Calling regress with int[]{} args will trigger a reordering.

Returns:
int[] with the current order of the regressors

regress

public RegressionResults regress()
                          throws ModelSpecificationException
Conducts a regression on the data in the model, using all regressors.

Specified by:
regress in interface UpdatingMultipleLinearRegression
Returns:
RegressionResults the structure holding all regression results
Throws:
ModelSpecificationException - - thrown if number of observations is less than the number of variables

regress

public RegressionResults regress(int numberOfRegressors)
                          throws ModelSpecificationException
Conducts a regression on the data in the model, using a subset of regressors.

Parameters:
numberOfRegressors - many of the regressors to include (either in canonical order, or in the current reordered state)
Returns:
RegressionResults the structure holding all regression results
Throws:
ModelSpecificationException - - thrown if number of observations is less than the number of variables or number of regressors requested is greater than the regressors in the model

regress

public RegressionResults regress(int[] variablesToInclude)
                          throws ModelSpecificationException
Conducts a regression on the data in the model, using regressors in array Calling this method will change the internal order of the regressors and care is required in interpreting the hatmatrix.

Specified by:
regress in interface UpdatingMultipleLinearRegression
Parameters:
variablesToInclude - array of variables to include in regression
Returns:
RegressionResults the structure holding all regression results
Throws:
ModelSpecificationException - - thrown if number of observations is less than the number of variables, the number of regressors requested is greater than the regressors in the model or a regressor index in regressor array does not exist


Copyright (c) 2003-2013 Apache Software Foundation