Srikanth Boyina: Statistics: Bootstrapping - Building models with lesser data (<30)

Most of the times we often come across the problem of fitting models with lesser data (<30).

To overcome this problem we can use the following methodology:

Residual Resampling – Bootstrapping:

Bootstrapping is the practice of estimating properties of an estimator by resampling. Residual Resampling is one of the types of bootstrapping.

Steps:
OLS fit is computed from the original data.
The residuals are then resampled and the independent variables are fixed.
The residuals are then added to the predicted values of the original fit to obtain a new Y vector.
This new Y vector is then fit against the original X variables.

It is advised that we can have 100 bootstrap samples for the validity of the model. For each bootstrap sample the model is fit. Then calculate the mean of the parameter estimates and std errors. This will be our final estimate.

As sas don’t have the direct procedure to calculate this method, I manually coded the procedure. This is given below:

/* Model data set */

title 'Cement Hardening Data from Hjorth, p 31';
data cement;
input x1-x4 y;
label x1='3CaOAl2O3'
x2='3CaOSiO2'
x3='4CaOAl2O3Fe2O3'
x4='2CaOSiO2';
cards;
7 26 6 60 78.5
1 29 15 52 74.3
11 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4

;
run;

/*Original fit to get the residuals */
proc reg data=cement;
model y=x1-x4;
output out=cemout r=resid p=pred;
run;

/*Resampling the residuals*/

data spds_wrk.ABC_boot(drop=resid) Spds_wrk.ABC_resid(keep=resid index);
set cemout;
index=_N_;
run;
data spds_wrk.ABC_OUTDATA (drop= i j);
do i = 1 TO 100;
sample_n = i;
do j = 1 TO 13;
index=j;
join_index = int(ranuni(123) * 13) + 1;
output;
end;
end;
run;

data spds_wrk.ABC_model_data;
merge spds_wrk.ABC_boot spds_wrk.ABC_OUTDATA;
by index;
run;

data spds_wrk.ABC_model_data;
merge spds_wrk.ABC_model_data spds_wrk.ABC_resid(rename=(index=join_index));
by join_index;
run;

proc sort data=spds_wrk.ABC_model_data;
by sample_n index;
run;

data spds_wrk.ABC_updated;
set spds_wrk.ABC_model_data;
ycap=pred+resid;
run;

/*Buliding the model with changed y 100 times */

proc reg data=spds_wrk.ABC_updated outest=pred;
model ycap=x1 x2 x3 x4;
by sample_n;
run;

/* Final Estimates*/
proc means data=pred;
var intercept x1 x2 x3 x4;
run;

The mean of Intercept, x2, x2, x3, and x4 will be the final estimates and its std deviations are the final std errors.

http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/bootfit.htm
http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29

Srikanth Boyina

Wednesday, July 8, 2009

Statistics: Bootstrapping - Building models with lesser data (<30)

No comments:

Post a Comment

About Me

Followers

Blog Archive