You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by dan <da...@gmail.com> on 2011/08/02 21:24:28 UTC

[math] Help with OLSMultipleLinearRegression

I have been using the OLSMultipleLinearRegression class successfully
for a while now, but I am having trouble in my current application.

The code is very simple, and looks like this:

OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(ys, z_bars);
double [] new_eta = regression.estimateRegressionParameters();

When I run this code with my current data, all of the regression
coefficients come back as NaNs.

In the input data, the z_bars are vectors that have been normalized to
sum to 1, and the ys are the logs of the "true" response variables (I
am trying to reproduce the results from a research paper, in which it
was claimed that logging the response variables made them more
normally distributed, resulting in a better fit).  Is there something
wrong with my setup?  It seems like, even if the logged data is not
very linear, that it should still be possible to obtain some OLS fit,
even if it is a poor one.  Any help would be appreciated.

Thank you,
Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: [math] Help with OLSMultipleLinearRegression

Posted by Greg Sterijevski <gs...@gmail.com>.
Going along with what Phil is saying, perhaps you are taking a log of a very
very small number, resulting in NaNs. Or, alternatively, columns have a very
small norm, so when you normalize, it is effectively a division by zero?

On Tue, Aug 2, 2011 at 2:57 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 8/2/11 12:24 PM, dan wrote:
> > I have been using the OLSMultipleLinearRegression class successfully
> > for a while now, but I am having trouble in my current application.
> >
> > The code is very simple, and looks like this:
> >
> > OLSMultipleLinearRegression regression = new
> OLSMultipleLinearRegression();
> > regression.setNoIntercept(true);
> > regression.newSampleData(ys, z_bars);
> > double [] new_eta = regression.estimateRegressionParameters();
> >
> > When I run this code with my current data, all of the regression
> > coefficients come back as NaNs.
> >
> > In the input data, the z_bars are vectors that have been normalized to
> > sum to 1, and the ys are the logs of the "true" response variables (I
> > am trying to reproduce the results from a research paper, in which it
> > was claimed that logging the response variables made them more
> > normally distributed, resulting in a better fit).  Is there something
> > wrong with my setup?  It seems like, even if the logged data is not
> > very linear, that it should still be possible to obtain some OLS fit,
> > even if it is a poor one.  Any help would be appreciated.
>
> Are you sure there are no NaNs in your input data?
>
> Phil
> >
> > Thank you,
> > Dan
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> > For additional commands, e-mail: user-help@commons.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

Re: [math] Help with OLSMultipleLinearRegression

Posted by Phil Steitz <ph...@gmail.com>.
On 8/2/11 12:24 PM, dan wrote:
> I have been using the OLSMultipleLinearRegression class successfully
> for a while now, but I am having trouble in my current application.
>
> The code is very simple, and looks like this:
>
> OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
> regression.setNoIntercept(true);
> regression.newSampleData(ys, z_bars);
> double [] new_eta = regression.estimateRegressionParameters();
>
> When I run this code with my current data, all of the regression
> coefficients come back as NaNs.
>
> In the input data, the z_bars are vectors that have been normalized to
> sum to 1, and the ys are the logs of the "true" response variables (I
> am trying to reproduce the results from a research paper, in which it
> was claimed that logging the response variables made them more
> normally distributed, resulting in a better fit).  Is there something
> wrong with my setup?  It seems like, even if the logged data is not
> very linear, that it should still be possible to obtain some OLS fit,
> even if it is a poor one.  Any help would be appreciated.

Are you sure there are no NaNs in your input data?

Phil
>
> Thank you,
> Dan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org