You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Greg Sterijevski <gs...@gmail.com> on 2011/07/12 05:40:58 UTC

Longley Data

Hello All,

I am testing the first 'updating' ols regression algorithm. I ran it through
the Wampler1 data. It gets 1.0s for all of the beta estimates. I next ran
the Longley dataset. I match, but with a tolerance of 1.0e-6. This is a bit
less than two orders of magnitude worse than the current incore estimator(
2.0e-8). My question to the list, is how important is this diff? Is it worth
tearing things apart to figure out where the error is accumulating?

Thanks,

-Greg

Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
What do you mean?

I run the following test:

        double[] design = new double[]{
            60323, 83.0, 234289, 2356, 1590, 107608, 1947,
            61122, 88.5, 259426, 2325, 1456, 108632, 1948,
            60171, 88.2, 258054, 3682, 1616, 109773, 1949,
            61187, 89.5, 284599, 3351, 1650, 110929, 1950,
            63221, 96.2, 328975, 2099, 3099, 112075, 1951,
            63639, 98.1, 346999, 1932, 3594, 113270, 1952,
            64989, 99.0, 365385, 1870, 3547, 115094, 1953,
            63761, 100.0, 363112, 3578, 3350, 116219, 1954,
            66019, 101.2, 397469, 2904, 3048, 117388, 1955,
            67857, 104.6, 419180, 2822, 2857, 118734, 1956,
            68169, 108.4, 442769, 2936, 2798, 120445, 1957,
            66513, 110.8, 444546, 4681, 2637, 121950, 1958,
            68655, 112.6, 482704, 3813, 2552, 123366, 1959,
            69564, 114.2, 502601, 3931, 2514, 125368, 1960,
            69331, 115.7, 518173, 4806, 2572, 127852, 1961,
            70551, 116.9, 554894, 4007, 2827, 130081, 1962
        };

        final int nobs = 16;
        final int nvars = 6;

        // Estimate the model
        MillerUpdatingRegression model = new MillerUpdatingRegression(6,
true, MathUtils.SAFE_MIN);
        int off = 0;
        double[] tmp = new double[6];
        for (int i = 0; i < nobs; i++) {
            System.arraycopy(design, off + 1, tmp, 0, nvars);
            model.addObservation(tmp, design[off]);
            off += nvars + 1;
        }

        // Check expected beta values from NIST
        RegressionResults result = model.regress();
        double[] betaHat = result.getParameterEstimates();
        TestUtils.assertEquals(betaHat,
                new double[]{-3482258.63459582, 15.0618722713733,
                    -0.358191792925910E-01, -2.02022980381683,
                    -1.03322686717359, -0.511041056535807E-01,
                    1829.15146461355}, 1E-6); //



The regression technique I am adding has parameters that are within 1.0e-6
of the certified values. OLSMultipleLinearRegressionTest is within 2.0e-8.



On Mon, Jul 11, 2011 at 11:32 PM, Ted Dunning <te...@gmail.com> wrote:

> Can you point at code?
>
> On Mon, Jul 11, 2011 at 9:07 PM, Greg Sterijevski <gsterijevski@gmail.com
> >wrote:
>
> > Yes, my apologies. I am a bit new to this.
> >
> >
> > On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <fl...@gmail.com>
> > wrote:
> >
> > > I'm assuming this is Commons Math. I've added a [math] so it catches
> > > the interest of those involved.
> > >
> > >
> > > On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
> > > <gs...@gmail.com> wrote:
> > > > Additionally, I pass all of the Wampler beta estimates.
> > > >
> > > > On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> > > > <gs...@gmail.com>wrote:
> > > >
> > > >> Hello All,
> > > >>
> > > >> I am testing the first 'updating' ols regression algorithm. I ran it
> > > >> through the Wampler1 data. It gets 1.0s for all of the beta
> estimates.
> > I
> > > >> next ran the Longley dataset. I match, but with a tolerance of
> 1.0e-6.
> > > This
> > > >> is a bit less than two orders of magnitude worse than the current
> > incore
> > > >> estimator( 2.0e-8). My question to the list, is how important is
> this
> > > diff?
> > > >> Is it worth tearing things apart to figure out where the error is
> > > >> accumulating?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> -Greg
> > > >>
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
>

Re: [math] Re: Longley Data

Posted by Ted Dunning <te...@gmail.com>.
Can you point at code?

On Mon, Jul 11, 2011 at 9:07 PM, Greg Sterijevski <gs...@gmail.com>wrote:

> Yes, my apologies. I am a bit new to this.
>
>
> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <fl...@gmail.com>
> wrote:
>
> > I'm assuming this is Commons Math. I've added a [math] so it catches
> > the interest of those involved.
> >
> >
> > On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
> > <gs...@gmail.com> wrote:
> > > Additionally, I pass all of the Wampler beta estimates.
> > >
> > > On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> > > <gs...@gmail.com>wrote:
> > >
> > >> Hello All,
> > >>
> > >> I am testing the first 'updating' ols regression algorithm. I ran it
> > >> through the Wampler1 data. It gets 1.0s for all of the beta estimates.
> I
> > >> next ran the Longley dataset. I match, but with a tolerance of 1.0e-6.
> > This
> > >> is a bit less than two orders of magnitude worse than the current
> incore
> > >> estimator( 2.0e-8). My question to the list, is how important is this
> > diff?
> > >> Is it worth tearing things apart to figure out where the error is
> > >> accumulating?
> > >>
> > >> Thanks,
> > >>
> > >> -Greg
> > >>
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>

Re: [math] Re: Longley Data

Posted by Phil Steitz <ph...@gmail.com>.
On 7/19/11 4:13 PM, Greg Sterijevski wrote:
> I think Luc was suggesting implementing the algorithm in extended precision.

I don't really see need for that at this point.  To settle the issue
on results precision, you should start though with high-precision
(or at least precision at the level presented for the X and Y data
by NIST) for the higher X powers.  Doing just that computation in
extended precision is one way to do that.  Of course, that could be
done externally and the data loaded by the test as a file.

Phil
>
> -Greg
>
> On Tue, Jul 19, 2011 at 4:55 PM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 7/18/11 6:31 PM, Greg Sterijevski wrote:
>>> All,
>>>
>>> I have pushed the implementation of the Miller Regression technique,
>> along
>>> with some tests. I am sure that there are a lot of sharp corners to file
>>> down and improve. However, I thought it would be prudent to get it out
>> and
>>> then we can further refine the code.
>> Thanks!  I just committed the code, with just minor cleanup.  I am
>> reviewing the article as we speak to verify implementation.  Others
>> are encouraged to join in here.   We need to complete the javadoc
>> and decide on exceptions as we stabilize the API here.
>>> On accuracy:
>>>
>>> I seem to match all of the digits of longley and wampler data. Filippelli
>> I
>>> have a very hard time matching except to a tolerance of 1.0e-5. If you
>> look
>>> at LIMDEP's website:
>>>
>>>
>> http://www.limdep.com/features/capabilities/accuracy/linear_regression_3.php
>>> I think that the code I am checking in does a bit better. I am happy
>> about
>>> that. However, there are some other issues with Filippelli. Namely, one
>> can
>>> affect the 'accuracy' of your results depending on how you present the
>> data.
>>> For example, if I generate the high order polynomial naively, x1 = x0 *
>> x0,
>>> x2  = x0 * x1, ..., x10 = x0 * x9, then I can hit the numbers within
>> 1.0e-5.
>>> If, however, I generate the Filipelli regressors by multiplying numbers
>>> whose magnitudes are similar:
>>>                             x1 = x0 * x0;
>>>                             x2 = x0 * x1;
>>>                             x3 = x0 * x2;
>>>                             x4 = x2 * x2;
>>>                             x5 = x2 *x3;
>>>                             x6 = x3 * x3;
>>> Then I have a very hard time making that 1.0e-5 tolerance.
>>>
>>> Does anyone know if there is some article which explains the proper way
>> to
>>> set up Filippelli's test?
>> Have not seen anything on this.
>>>
>>> Speaking to Luc's point, maybe the correct thing to do is to move to
>>> arbitrary precision. I wanted to avoid this until I was at a deadend.
>>> Perhaps the time is now....
>> To generate the x values, yes that would probably be best.
>>
>>
>> Phil
>>
>>> On tests:
>>>
>>> I intend to push 3-4 tests soon. There are 17 tests in the first suite I
>>> sent in.
>>>
>>> -Greg
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
I think Luc was suggesting implementing the algorithm in extended precision.


-Greg

On Tue, Jul 19, 2011 at 4:55 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 7/18/11 6:31 PM, Greg Sterijevski wrote:
> > All,
> >
> > I have pushed the implementation of the Miller Regression technique,
> along
> > with some tests. I am sure that there are a lot of sharp corners to file
> > down and improve. However, I thought it would be prudent to get it out
> and
> > then we can further refine the code.
>
> Thanks!  I just committed the code, with just minor cleanup.  I am
> reviewing the article as we speak to verify implementation.  Others
> are encouraged to join in here.   We need to complete the javadoc
> and decide on exceptions as we stabilize the API here.
> >
> > On accuracy:
> >
> > I seem to match all of the digits of longley and wampler data. Filippelli
> I
> > have a very hard time matching except to a tolerance of 1.0e-5. If you
> look
> > at LIMDEP's website:
> >
> >
> http://www.limdep.com/features/capabilities/accuracy/linear_regression_3.php
> >
> > I think that the code I am checking in does a bit better. I am happy
> about
> > that. However, there are some other issues with Filippelli. Namely, one
> can
> > affect the 'accuracy' of your results depending on how you present the
> data.
> > For example, if I generate the high order polynomial naively, x1 = x0 *
> x0,
> > x2  = x0 * x1, ..., x10 = x0 * x9, then I can hit the numbers within
> 1.0e-5.
> > If, however, I generate the Filipelli regressors by multiplying numbers
> > whose magnitudes are similar:
> >                             x1 = x0 * x0;
> >                             x2 = x0 * x1;
> >                             x3 = x0 * x2;
> >                             x4 = x2 * x2;
> >                             x5 = x2 *x3;
> >                             x6 = x3 * x3;
> > Then I have a very hard time making that 1.0e-5 tolerance.
> >
> > Does anyone know if there is some article which explains the proper way
> to
> > set up Filippelli's test?
>
> Have not seen anything on this.
> >
> >
> > Speaking to Luc's point, maybe the correct thing to do is to move to
> > arbitrary precision. I wanted to avoid this until I was at a deadend.
> > Perhaps the time is now....
>
> To generate the x values, yes that would probably be best.
>
>
> Phil
>
> >
> > On tests:
> >
> > I intend to push 3-4 tests soon. There are 17 tests in the first suite I
> > sent in.
> >
> > -Greg
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Re: Longley Data

Posted by Phil Steitz <ph...@gmail.com>.
On 7/18/11 6:31 PM, Greg Sterijevski wrote:
> All,
>
> I have pushed the implementation of the Miller Regression technique, along
> with some tests. I am sure that there are a lot of sharp corners to file
> down and improve. However, I thought it would be prudent to get it out and
> then we can further refine the code.

Thanks!  I just committed the code, with just minor cleanup.  I am
reviewing the article as we speak to verify implementation.  Others
are encouraged to join in here.   We need to complete the javadoc
and decide on exceptions as we stabilize the API here.
>
> On accuracy:
>
> I seem to match all of the digits of longley and wampler data. Filippelli I
> have a very hard time matching except to a tolerance of 1.0e-5. If you look
> at LIMDEP's website:
>
> http://www.limdep.com/features/capabilities/accuracy/linear_regression_3.php
>
> I think that the code I am checking in does a bit better. I am happy about
> that. However, there are some other issues with Filippelli. Namely, one can
> affect the 'accuracy' of your results depending on how you present the data.
> For example, if I generate the high order polynomial naively, x1 = x0 * x0,
> x2  = x0 * x1, ..., x10 = x0 * x9, then I can hit the numbers within 1.0e-5.
> If, however, I generate the Filipelli regressors by multiplying numbers
> whose magnitudes are similar:
>                             x1 = x0 * x0;
>                             x2 = x0 * x1;
>                             x3 = x0 * x2;
>                             x4 = x2 * x2;
>                             x5 = x2 *x3;
>                             x6 = x3 * x3;
> Then I have a very hard time making that 1.0e-5 tolerance.
>
> Does anyone know if there is some article which explains the proper way to
> set up Filippelli's test?

Have not seen anything on this.
>
>
> Speaking to Luc's point, maybe the correct thing to do is to move to
> arbitrary precision. I wanted to avoid this until I was at a deadend.
> Perhaps the time is now....

To generate the x values, yes that would probably be best.


Phil

>
> On tests:
>
> I intend to push 3-4 tests soon. There are 17 tests in the first suite I
> sent in.
>
> -Greg
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
All,

I have pushed the implementation of the Miller Regression technique, along
with some tests. I am sure that there are a lot of sharp corners to file
down and improve. However, I thought it would be prudent to get it out and
then we can further refine the code.

On accuracy:

I seem to match all of the digits of longley and wampler data. Filippelli I
have a very hard time matching except to a tolerance of 1.0e-5. If you look
at LIMDEP's website:

http://www.limdep.com/features/capabilities/accuracy/linear_regression_3.php

I think that the code I am checking in does a bit better. I am happy about
that. However, there are some other issues with Filippelli. Namely, one can
affect the 'accuracy' of your results depending on how you present the data.
For example, if I generate the high order polynomial naively, x1 = x0 * x0,
x2  = x0 * x1, ..., x10 = x0 * x9, then I can hit the numbers within 1.0e-5.
If, however, I generate the Filipelli regressors by multiplying numbers
whose magnitudes are similar:
                            x1 = x0 * x0;
                            x2 = x0 * x1;
                            x3 = x0 * x2;
                            x4 = x2 * x2;
                            x5 = x2 *x3;
                            x6 = x3 * x3;
Then I have a very hard time making that 1.0e-5 tolerance.

Does anyone know if there is some article which explains the proper way to
set up Filippelli's test?


Speaking to Luc's point, maybe the correct thing to do is to move to
arbitrary precision. I wanted to avoid this until I was at a deadend.
Perhaps the time is now....

On tests:

I intend to push 3-4 tests soon. There are 17 tests in the first suite I
sent in.

-Greg

Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
Hi Luc,

I will look at this in dfp. I saw the package and thought it was
davidon-fletcher-powel. ;-)

I attempted to do what you suggest with BigDecimal. Everything was okay and
there were some marginal benefits with doing so, but I thought the hassle
was not worth it (at least with BigDecimal).

I will try with dfp!

Thank you,

-Greg

On Fri, Jul 15, 2011 at 1:56 AM, Luc Maisonobe <Lu...@free.fr>wrote:

> Le 15/07/2011 02:37, Greg Sterijevski a écrit :
>
>  The usual issues with numerical techniques, how you calculate (c * x + d *
>> y)/e matters...
>> It turns out that religiously following the article and defining c_bar  =
>> c
>> / e is not a good idea.
>>
>> The Filippelli data is still a bit dicey. I would like to resolve where
>> the
>> error is accumulating there as well. That's really the last thing
>> preventing
>> me from sending the patch with the Miller-Gentlemen Regression to Phil.
>>
>
> I don't know whether this is feasible in your case, but when trying to find
> this kind of numerical errors, I found useful to just redo the computation
> in parallel to high precision. Up to a few months ago, I was simply doing
> this using emacs (yes, emacs rocks) configured with 50 significant digits?
> Now it is easier since we have our own dfp package in [math].
>
> Luc
>
>
>
>> -Greg
>>
>> On Thu, Jul 14, 2011 at 1:18 PM, Ted Dunning<te...@gmail.com>
>>  wrote:
>>
>>  What was the problem?
>>>
>>> On Wed, Jul 13, 2011 at 8:33 PM, Greg Sterijevski<gsterijevski@**
>>> gmail.com <gs...@gmail.com>
>>>
>>>> wrote:
>>>>
>>>
>>>  Phil,
>>>>
>>>> Got it! I fit longley to all printed values. I have not broken
>>>>
>>> anything...
>>>
>>>> I
>>>> need to type up a few loose ends, then I will send a patch.
>>>>
>>>> -Greg
>>>>
>>>> On Tue, Jul 12, 2011 at 2:35 PM, Phil Steitz<ph...@gmail.com>
>>>> wrote:
>>>>
>>>>  On 7/12/11 12:12 PM, Greg Sterijevski wrote:
>>>>>
>>>>>> All,
>>>>>>
>>>>>> So I included the wampler data in the test suite. The interesting
>>>>>>
>>>>> thing,
>>>>
>>>>> is
>>>>>
>>>>>> to get clean runs I need wider tolerances with OLSMultipleRegression
>>>>>>
>>>>> than
>>>>
>>>>> with the version of the Miller algorithm I am coding up.
>>>>>>
>>>>> This is good for your Miller impl, not so good for
>>>>> OLSMultipleRegression.
>>>>>
>>>>>> Perhaps we should come to a consensus of what good enough is? How
>>>>>>
>>>>> close
>>>
>>>> do
>>>>>
>>>>>> we want to be? Should we require passing on all of NIST's 'hard'
>>>>>>
>>>>> problems?
>>>>>
>>>>>> (for all regression techniques that get cooked up)
>>>>>>
>>>>>>  The goal should be to match all of the displayed digits in the
>>>>> reference data.  When we can't do that, we should try to understand
>>>>> why and aim to, if possible, improve the impls.   As we improve the
>>>>> code, the tolerances in the tests can be improved.  Characterization
>>>>> of the types of models where the different implementations do well /
>>>>> poorly is another thing we should aim for (and include in the
>>>>> javadoc).  As with all reference validation tests, we need to keep
>>>>> in mind that a) the "hard" examples are designed to be numerically
>>>>> unstable and b) conversely, a handful of examples does not really
>>>>> demonstrate correctness.
>>>>>
>>>>> Phil
>>>>>
>>>>>> -Greg
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------**------------------------------**
>>>>> ---------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<de...@commons.apache.org>
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<de...@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Re: Longley Data

Posted by Luc Maisonobe <Lu...@free.fr>.
Le 15/07/2011 02:37, Greg Sterijevski a écrit :
> The usual issues with numerical techniques, how you calculate (c * x + d *
> y)/e matters...
> It turns out that religiously following the article and defining c_bar  = c
> / e is not a good idea.
>
> The Filippelli data is still a bit dicey. I would like to resolve where the
> error is accumulating there as well. That's really the last thing preventing
> me from sending the patch with the Miller-Gentlemen Regression to Phil.

I don't know whether this is feasible in your case, but when trying to 
find this kind of numerical errors, I found useful to just redo the 
computation in parallel to high precision. Up to a few months ago, I was 
simply doing this using emacs (yes, emacs rocks) configured with 50 
significant digits? Now it is easier since we have our own dfp package 
in [math].

Luc

>
> -Greg
>
> On Thu, Jul 14, 2011 at 1:18 PM, Ted Dunning<te...@gmail.com>  wrote:
>
>> What was the problem?
>>
>> On Wed, Jul 13, 2011 at 8:33 PM, Greg Sterijevski<gsterijevski@gmail.com
>>> wrote:
>>
>>> Phil,
>>>
>>> Got it! I fit longley to all printed values. I have not broken
>> anything...
>>> I
>>> need to type up a few loose ends, then I will send a patch.
>>>
>>> -Greg
>>>
>>> On Tue, Jul 12, 2011 at 2:35 PM, Phil Steitz<ph...@gmail.com>
>>> wrote:
>>>
>>>> On 7/12/11 12:12 PM, Greg Sterijevski wrote:
>>>>> All,
>>>>>
>>>>> So I included the wampler data in the test suite. The interesting
>>> thing,
>>>> is
>>>>> to get clean runs I need wider tolerances with OLSMultipleRegression
>>> than
>>>>> with the version of the Miller algorithm I am coding up.
>>>> This is good for your Miller impl, not so good for
>>>> OLSMultipleRegression.
>>>>> Perhaps we should come to a consensus of what good enough is? How
>> close
>>>> do
>>>>> we want to be? Should we require passing on all of NIST's 'hard'
>>>> problems?
>>>>> (for all regression techniques that get cooked up)
>>>>>
>>>> The goal should be to match all of the displayed digits in the
>>>> reference data.  When we can't do that, we should try to understand
>>>> why and aim to, if possible, improve the impls.   As we improve the
>>>> code, the tolerances in the tests can be improved.  Characterization
>>>> of the types of models where the different implementations do well /
>>>> poorly is another thing we should aim for (and include in the
>>>> javadoc).  As with all reference validation tests, we need to keep
>>>> in mind that a) the "hard" examples are designed to be numerically
>>>> unstable and b) conversely, a handful of examples does not really
>>>> demonstrate correctness.
>>>>
>>>> Phil
>>>>> -Greg
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
The usual issues with numerical techniques, how you calculate (c * x + d *
y)/e matters...
It turns out that religiously following the article and defining c_bar  = c
/ e is not a good idea.

The Filippelli data is still a bit dicey. I would like to resolve where the
error is accumulating there as well. That's really the last thing preventing
me from sending the patch with the Miller-Gentlemen Regression to Phil.

-Greg

On Thu, Jul 14, 2011 at 1:18 PM, Ted Dunning <te...@gmail.com> wrote:

> What was the problem?
>
> On Wed, Jul 13, 2011 at 8:33 PM, Greg Sterijevski <gsterijevski@gmail.com
> >wrote:
>
> > Phil,
> >
> > Got it! I fit longley to all printed values. I have not broken
> anything...
> > I
> > need to type up a few loose ends, then I will send a patch.
> >
> > -Greg
> >
> > On Tue, Jul 12, 2011 at 2:35 PM, Phil Steitz <ph...@gmail.com>
> > wrote:
> >
> > > On 7/12/11 12:12 PM, Greg Sterijevski wrote:
> > > > All,
> > > >
> > > > So I included the wampler data in the test suite. The interesting
> > thing,
> > > is
> > > > to get clean runs I need wider tolerances with OLSMultipleRegression
> > than
> > > > with the version of the Miller algorithm I am coding up.
> > > This is good for your Miller impl, not so good for
> > > OLSMultipleRegression.
> > > > Perhaps we should come to a consensus of what good enough is? How
> close
> > > do
> > > > we want to be? Should we require passing on all of NIST's 'hard'
> > > problems?
> > > > (for all regression techniques that get cooked up)
> > > >
> > > The goal should be to match all of the displayed digits in the
> > > reference data.  When we can't do that, we should try to understand
> > > why and aim to, if possible, improve the impls.   As we improve the
> > > code, the tolerances in the tests can be improved.  Characterization
> > > of the types of models where the different implementations do well /
> > > poorly is another thing we should aim for (and include in the
> > > javadoc).  As with all reference validation tests, we need to keep
> > > in mind that a) the "hard" examples are designed to be numerically
> > > unstable and b) conversely, a handful of examples does not really
> > > demonstrate correctness.
> > >
> > > Phil
> > > > -Greg
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
>

Re: [math] Re: Longley Data

Posted by Ted Dunning <te...@gmail.com>.
What was the problem?

On Wed, Jul 13, 2011 at 8:33 PM, Greg Sterijevski <gs...@gmail.com>wrote:

> Phil,
>
> Got it! I fit longley to all printed values. I have not broken anything...
> I
> need to type up a few loose ends, then I will send a patch.
>
> -Greg
>
> On Tue, Jul 12, 2011 at 2:35 PM, Phil Steitz <ph...@gmail.com>
> wrote:
>
> > On 7/12/11 12:12 PM, Greg Sterijevski wrote:
> > > All,
> > >
> > > So I included the wampler data in the test suite. The interesting
> thing,
> > is
> > > to get clean runs I need wider tolerances with OLSMultipleRegression
> than
> > > with the version of the Miller algorithm I am coding up.
> > This is good for your Miller impl, not so good for
> > OLSMultipleRegression.
> > > Perhaps we should come to a consensus of what good enough is? How close
> > do
> > > we want to be? Should we require passing on all of NIST's 'hard'
> > problems?
> > > (for all regression techniques that get cooked up)
> > >
> > The goal should be to match all of the displayed digits in the
> > reference data.  When we can't do that, we should try to understand
> > why and aim to, if possible, improve the impls.   As we improve the
> > code, the tolerances in the tests can be improved.  Characterization
> > of the types of models where the different implementations do well /
> > poorly is another thing we should aim for (and include in the
> > javadoc).  As with all reference validation tests, we need to keep
> > in mind that a) the "hard" examples are designed to be numerically
> > unstable and b) conversely, a handful of examples does not really
> > demonstrate correctness.
> >
> > Phil
> > > -Greg
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>

Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
Phil,

Got it! I fit longley to all printed values. I have not broken anything... I
need to type up a few loose ends, then I will send a patch.

-Greg

On Tue, Jul 12, 2011 at 2:35 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 7/12/11 12:12 PM, Greg Sterijevski wrote:
> > All,
> >
> > So I included the wampler data in the test suite. The interesting thing,
> is
> > to get clean runs I need wider tolerances with OLSMultipleRegression than
> > with the version of the Miller algorithm I am coding up.
> This is good for your Miller impl, not so good for
> OLSMultipleRegression.
> > Perhaps we should come to a consensus of what good enough is? How close
> do
> > we want to be? Should we require passing on all of NIST's 'hard'
> problems?
> > (for all regression techniques that get cooked up)
> >
> The goal should be to match all of the displayed digits in the
> reference data.  When we can't do that, we should try to understand
> why and aim to, if possible, improve the impls.   As we improve the
> code, the tolerances in the tests can be improved.  Characterization
> of the types of models where the different implementations do well /
> poorly is another thing we should aim for (and include in the
> javadoc).  As with all reference validation tests, we need to keep
> in mind that a) the "hard" examples are designed to be numerically
> unstable and b) conversely, a handful of examples does not really
> demonstrate correctness.
>
> Phil
> > -Greg
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Re: Longley Data

Posted by Phil Steitz <ph...@gmail.com>.
On 7/12/11 12:12 PM, Greg Sterijevski wrote:
> All,
>
> So I included the wampler data in the test suite. The interesting thing, is
> to get clean runs I need wider tolerances with OLSMultipleRegression than
> with the version of the Miller algorithm I am coding up.
This is good for your Miller impl, not so good for
OLSMultipleRegression.
> Perhaps we should come to a consensus of what good enough is? How close do
> we want to be? Should we require passing on all of NIST's 'hard' problems?
> (for all regression techniques that get cooked up)
>
The goal should be to match all of the displayed digits in the
reference data.  When we can't do that, we should try to understand
why and aim to, if possible, improve the impls.   As we improve the
code, the tolerances in the tests can be improved.  Characterization
of the types of models where the different implementations do well /
poorly is another thing we should aim for (and include in the
javadoc).  As with all reference validation tests, we need to keep
in mind that a) the "hard" examples are designed to be numerically
unstable and b) conversely, a handful of examples does not really
demonstrate correctness. 

Phil
> -Greg
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
All,

So I included the wampler data in the test suite. The interesting thing, is
to get clean runs I need wider tolerances with OLSMultipleRegression than
with the version of the Miller algorithm I am coding up.

Perhaps we should come to a consensus of what good enough is? How close do
we want to be? Should we require passing on all of NIST's 'hard' problems?
(for all regression techniques that get cooked up)

-Greg

Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
Yes, I understand that Filippelli should be separate. I was more concerned
with Wampler... though I guess since I haven't checked if they all run, they
might need separate commits.

-Greg

On Tue, Jul 12, 2011 at 11:37 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 7/12/11 9:14 AM, Greg Sterijevski wrote:
> > I have opened a JIRA issue. I would also like to add the Wampler1-4 tests
> > into OLSMultipleRegression. Would it be okay to do this with one change?
> > Instead of multiple ones?
>
> Thanks!
>
> It would be better to separate the "successful" test patch.  Create
> a new issue called something like "Additional NIST reference data
> tests for OLS Regression."  There are two reasons for separating
> these patches:
>
> 1) We never like to commit failing tests. The test case illustrating
> MATH-615 will get committed when the bug is resolved.
> 2) The non-Fillippi tests have nothing to do with MATH-615.
>
> Many thanks for implementing the reference data tests.
>
> Phil
> > On Tue, Jul 12, 2011 at 10:37 AM, Greg Sterijevski
> > <gs...@gmail.com>wrote:
> >
> >> I will add the tests. I do believe it is the QR decomp which is failing.
> >>
> >> -Greg
> >>
> >>
> >> On Tue, Jul 12, 2011 at 10:31 AM, Phil Steitz <phil.steitz@gmail.com
> >wrote:
> >>
> >>> On 7/12/11 7:43 AM, Greg Sterijevski wrote:
> >>>> I will run against R.
> >>>>
> >>>> Here is the official repository @ NIST for Wampler/Longley/Filippelli
> >>> data..
> >>>> http://www.itl.nist.gov/div898/strd/lls/lls.shtml
> >>>>
> >>>> If you follow the link, the ASCII data files also have the certified
> >>>> results.
> >>>>
> >>>> Would you like me to add these tests to the unit test for
> >>>> OLSMultipleLinearRegression?
> >>>>
> >>> That would be great.  Thanks!
> >>>
> >>> We should also figure out why OLSLinearRegression thinks the
> >>> Filippelli design matrix is singular.   We should raise a JIRA for
> >>> that.  Could be this is a QR decomp issue.
> >>>
> >>> Phil
> >>>> On Tue, Jul 12, 2011 at 12:52 AM, Phil Steitz <ph...@gmail.com>
> >>> wrote:
> >>>>> On 7/11/11 9:34 PM, Greg Sterijevski wrote:
> >>>>>> I also ran the filipelli data through both the regression technique
> >>> that
> >>>>> I
> >>>>>> am working on, and the current multiple regression package. My work
> in
> >>>>>> progress gets estimates which though not great are close to the
> >>> certified
> >>>>>> values. OLSMultipleLinearRegression exceptions out, complaining
> about
> >>> a
> >>>>>> singular matrix.
> >>>>> I assume the design matrix is near-singular, correct?  Where did the
> >>>>> certified values come from?  If you have access to R, it would be
> >>>>> good to compare results against R as well.  There is R code in
> >>>>> src/test/R set up to compare results against [math].
> >>>>>> On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
> >>>>>> <gs...@gmail.com>wrote:
> >>>>>>
> >>>>>>> Yes, my apologies. I am a bit new to this.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <
> flamefew@gmail.com
> >>>>>> wrote:
> >>>>>>>> I'm assuming this is Commons Math. I've added a [math] so it
> catches
> >>>>>>>> the interest of those involved.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
> >>>>>>>> <gs...@gmail.com> wrote:
> >>>>>>>>> Additionally, I pass all of the Wampler beta estimates.
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> >>>>>>>>> <gs...@gmail.com>wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello All,
> >>>>>>>>>>
> >>>>>>>>>> I am testing the first 'updating' ols regression algorithm. I
> ran
> >>> it
> >>>>>>>>>> through the Wampler1 data. It gets 1.0s for all of the beta
> >>>>> estimates.
> >>>>>>>> I
> >>>>>>>>>> next ran the Longley dataset. I match, but with a tolerance of
> >>>>> 1.0e-6.
> >>>>>>>> This
> >>>>>>>>>> is a bit less than two orders of magnitude worse than the
> current
> >>>>>>>> incore
> >>>>>>>>>> estimator( 2.0e-8). My question to the list, is how important is
> >>> this
> >>>>>>>> diff?
> >>>>>>>>>> Is it worth tearing things apart to figure out where the error
> is
> >>>>>>>>>> accumulating?
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> -Greg
> >>>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Re: Longley Data

Posted by Phil Steitz <ph...@gmail.com>.
On 7/12/11 9:14 AM, Greg Sterijevski wrote:
> I have opened a JIRA issue. I would also like to add the Wampler1-4 tests
> into OLSMultipleRegression. Would it be okay to do this with one change?
> Instead of multiple ones?

Thanks!

It would be better to separate the "successful" test patch.  Create
a new issue called something like "Additional NIST reference data
tests for OLS Regression."  There are two reasons for separating
these patches:

1) We never like to commit failing tests. The test case illustrating
MATH-615 will get committed when the bug is resolved.
2) The non-Fillippi tests have nothing to do with MATH-615.

Many thanks for implementing the reference data tests.

Phil
> On Tue, Jul 12, 2011 at 10:37 AM, Greg Sterijevski
> <gs...@gmail.com>wrote:
>
>> I will add the tests. I do believe it is the QR decomp which is failing.
>>
>> -Greg
>>
>>
>> On Tue, Jul 12, 2011 at 10:31 AM, Phil Steitz <ph...@gmail.com>wrote:
>>
>>> On 7/12/11 7:43 AM, Greg Sterijevski wrote:
>>>> I will run against R.
>>>>
>>>> Here is the official repository @ NIST for Wampler/Longley/Filippelli
>>> data..
>>>> http://www.itl.nist.gov/div898/strd/lls/lls.shtml
>>>>
>>>> If you follow the link, the ASCII data files also have the certified
>>>> results.
>>>>
>>>> Would you like me to add these tests to the unit test for
>>>> OLSMultipleLinearRegression?
>>>>
>>> That would be great.  Thanks!
>>>
>>> We should also figure out why OLSLinearRegression thinks the
>>> Filippelli design matrix is singular.   We should raise a JIRA for
>>> that.  Could be this is a QR decomp issue.
>>>
>>> Phil
>>>> On Tue, Jul 12, 2011 at 12:52 AM, Phil Steitz <ph...@gmail.com>
>>> wrote:
>>>>> On 7/11/11 9:34 PM, Greg Sterijevski wrote:
>>>>>> I also ran the filipelli data through both the regression technique
>>> that
>>>>> I
>>>>>> am working on, and the current multiple regression package. My work in
>>>>>> progress gets estimates which though not great are close to the
>>> certified
>>>>>> values. OLSMultipleLinearRegression exceptions out, complaining about
>>> a
>>>>>> singular matrix.
>>>>> I assume the design matrix is near-singular, correct?  Where did the
>>>>> certified values come from?  If you have access to R, it would be
>>>>> good to compare results against R as well.  There is R code in
>>>>> src/test/R set up to compare results against [math].
>>>>>> On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
>>>>>> <gs...@gmail.com>wrote:
>>>>>>
>>>>>>> Yes, my apologies. I am a bit new to this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <flamefew@gmail.com
>>>>>> wrote:
>>>>>>>> I'm assuming this is Commons Math. I've added a [math] so it catches
>>>>>>>> the interest of those involved.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
>>>>>>>> <gs...@gmail.com> wrote:
>>>>>>>>> Additionally, I pass all of the Wampler beta estimates.
>>>>>>>>>
>>>>>>>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
>>>>>>>>> <gs...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hello All,
>>>>>>>>>>
>>>>>>>>>> I am testing the first 'updating' ols regression algorithm. I ran
>>> it
>>>>>>>>>> through the Wampler1 data. It gets 1.0s for all of the beta
>>>>> estimates.
>>>>>>>> I
>>>>>>>>>> next ran the Longley dataset. I match, but with a tolerance of
>>>>> 1.0e-6.
>>>>>>>> This
>>>>>>>>>> is a bit less than two orders of magnitude worse than the current
>>>>>>>> incore
>>>>>>>>>> estimator( 2.0e-8). My question to the list, is how important is
>>> this
>>>>>>>> diff?
>>>>>>>>>> Is it worth tearing things apart to figure out where the error is
>>>>>>>>>> accumulating?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -Greg
>>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>>
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
I have opened a JIRA issue. I would also like to add the Wampler1-4 tests
into OLSMultipleRegression. Would it be okay to do this with one change?
Instead of multiple ones?

On Tue, Jul 12, 2011 at 10:37 AM, Greg Sterijevski
<gs...@gmail.com>wrote:

> I will add the tests. I do believe it is the QR decomp which is failing.
>
> -Greg
>
>
> On Tue, Jul 12, 2011 at 10:31 AM, Phil Steitz <ph...@gmail.com>wrote:
>
>> On 7/12/11 7:43 AM, Greg Sterijevski wrote:
>> > I will run against R.
>> >
>> > Here is the official repository @ NIST for Wampler/Longley/Filippelli
>> data..
>> >
>> > http://www.itl.nist.gov/div898/strd/lls/lls.shtml
>> >
>> > If you follow the link, the ASCII data files also have the certified
>> > results.
>> >
>> > Would you like me to add these tests to the unit test for
>> > OLSMultipleLinearRegression?
>> >
>> That would be great.  Thanks!
>>
>> We should also figure out why OLSLinearRegression thinks the
>> Filippelli design matrix is singular.   We should raise a JIRA for
>> that.  Could be this is a QR decomp issue.
>>
>> Phil
>> > On Tue, Jul 12, 2011 at 12:52 AM, Phil Steitz <ph...@gmail.com>
>> wrote:
>> >
>> >> On 7/11/11 9:34 PM, Greg Sterijevski wrote:
>> >>> I also ran the filipelli data through both the regression technique
>> that
>> >> I
>> >>> am working on, and the current multiple regression package. My work in
>> >>> progress gets estimates which though not great are close to the
>> certified
>> >>> values. OLSMultipleLinearRegression exceptions out, complaining about
>> a
>> >>> singular matrix.
>> >> I assume the design matrix is near-singular, correct?  Where did the
>> >> certified values come from?  If you have access to R, it would be
>> >> good to compare results against R as well.  There is R code in
>> >> src/test/R set up to compare results against [math].
>> >>> On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
>> >>> <gs...@gmail.com>wrote:
>> >>>
>> >>>> Yes, my apologies. I am a bit new to this.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <flamefew@gmail.com
>> >>> wrote:
>> >>>>> I'm assuming this is Commons Math. I've added a [math] so it catches
>> >>>>> the interest of those involved.
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
>> >>>>> <gs...@gmail.com> wrote:
>> >>>>>> Additionally, I pass all of the Wampler beta estimates.
>> >>>>>>
>> >>>>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
>> >>>>>> <gs...@gmail.com>wrote:
>> >>>>>>
>> >>>>>>> Hello All,
>> >>>>>>>
>> >>>>>>> I am testing the first 'updating' ols regression algorithm. I ran
>> it
>> >>>>>>> through the Wampler1 data. It gets 1.0s for all of the beta
>> >> estimates.
>> >>>>> I
>> >>>>>>> next ran the Longley dataset. I match, but with a tolerance of
>> >> 1.0e-6.
>> >>>>> This
>> >>>>>>> is a bit less than two orders of magnitude worse than the current
>> >>>>> incore
>> >>>>>>> estimator( 2.0e-8). My question to the list, is how important is
>> this
>> >>>>> diff?
>> >>>>>>> Is it worth tearing things apart to figure out where the error is
>> >>>>>>> accumulating?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>>
>> >>>>>>> -Greg
>> >>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
>> >>>>>
>> >>>>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> >> For additional commands, e-mail: dev-help@commons.apache.org
>> >>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
I will add the tests. I do believe it is the QR decomp which is failing.

-Greg

On Tue, Jul 12, 2011 at 10:31 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 7/12/11 7:43 AM, Greg Sterijevski wrote:
> > I will run against R.
> >
> > Here is the official repository @ NIST for Wampler/Longley/Filippelli
> data..
> >
> > http://www.itl.nist.gov/div898/strd/lls/lls.shtml
> >
> > If you follow the link, the ASCII data files also have the certified
> > results.
> >
> > Would you like me to add these tests to the unit test for
> > OLSMultipleLinearRegression?
> >
> That would be great.  Thanks!
>
> We should also figure out why OLSLinearRegression thinks the
> Filippelli design matrix is singular.   We should raise a JIRA for
> that.  Could be this is a QR decomp issue.
>
> Phil
> > On Tue, Jul 12, 2011 at 12:52 AM, Phil Steitz <ph...@gmail.com>
> wrote:
> >
> >> On 7/11/11 9:34 PM, Greg Sterijevski wrote:
> >>> I also ran the filipelli data through both the regression technique
> that
> >> I
> >>> am working on, and the current multiple regression package. My work in
> >>> progress gets estimates which though not great are close to the
> certified
> >>> values. OLSMultipleLinearRegression exceptions out, complaining about a
> >>> singular matrix.
> >> I assume the design matrix is near-singular, correct?  Where did the
> >> certified values come from?  If you have access to R, it would be
> >> good to compare results against R as well.  There is R code in
> >> src/test/R set up to compare results against [math].
> >>> On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
> >>> <gs...@gmail.com>wrote:
> >>>
> >>>> Yes, my apologies. I am a bit new to this.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <flamefew@gmail.com
> >>> wrote:
> >>>>> I'm assuming this is Commons Math. I've added a [math] so it catches
> >>>>> the interest of those involved.
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
> >>>>> <gs...@gmail.com> wrote:
> >>>>>> Additionally, I pass all of the Wampler beta estimates.
> >>>>>>
> >>>>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> >>>>>> <gs...@gmail.com>wrote:
> >>>>>>
> >>>>>>> Hello All,
> >>>>>>>
> >>>>>>> I am testing the first 'updating' ols regression algorithm. I ran
> it
> >>>>>>> through the Wampler1 data. It gets 1.0s for all of the beta
> >> estimates.
> >>>>> I
> >>>>>>> next ran the Longley dataset. I match, but with a tolerance of
> >> 1.0e-6.
> >>>>> This
> >>>>>>> is a bit less than two orders of magnitude worse than the current
> >>>>> incore
> >>>>>>> estimator( 2.0e-8). My question to the list, is how important is
> this
> >>>>> diff?
> >>>>>>> Is it worth tearing things apart to figure out where the error is
> >>>>>>> accumulating?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> -Greg
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Re: Longley Data

Posted by Phil Steitz <ph...@gmail.com>.
On 7/12/11 7:43 AM, Greg Sterijevski wrote:
> I will run against R.
>
> Here is the official repository @ NIST for Wampler/Longley/Filippelli data..
>
> http://www.itl.nist.gov/div898/strd/lls/lls.shtml
>
> If you follow the link, the ASCII data files also have the certified
> results.
>
> Would you like me to add these tests to the unit test for
> OLSMultipleLinearRegression?
>
That would be great.  Thanks!

We should also figure out why OLSLinearRegression thinks the
Filippelli design matrix is singular.   We should raise a JIRA for
that.  Could be this is a QR decomp issue.

Phil
> On Tue, Jul 12, 2011 at 12:52 AM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 7/11/11 9:34 PM, Greg Sterijevski wrote:
>>> I also ran the filipelli data through both the regression technique that
>> I
>>> am working on, and the current multiple regression package. My work in
>>> progress gets estimates which though not great are close to the certified
>>> values. OLSMultipleLinearRegression exceptions out, complaining about a
>>> singular matrix.
>> I assume the design matrix is near-singular, correct?  Where did the
>> certified values come from?  If you have access to R, it would be
>> good to compare results against R as well.  There is R code in
>> src/test/R set up to compare results against [math].
>>> On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
>>> <gs...@gmail.com>wrote:
>>>
>>>> Yes, my apologies. I am a bit new to this.
>>>>
>>>>
>>>>
>>>> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <flamefew@gmail.com
>>> wrote:
>>>>> I'm assuming this is Commons Math. I've added a [math] so it catches
>>>>> the interest of those involved.
>>>>>
>>>>>
>>>>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
>>>>> <gs...@gmail.com> wrote:
>>>>>> Additionally, I pass all of the Wampler beta estimates.
>>>>>>
>>>>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
>>>>>> <gs...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I am testing the first 'updating' ols regression algorithm. I ran it
>>>>>>> through the Wampler1 data. It gets 1.0s for all of the beta
>> estimates.
>>>>> I
>>>>>>> next ran the Longley dataset. I match, but with a tolerance of
>> 1.0e-6.
>>>>> This
>>>>>>> is a bit less than two orders of magnitude worse than the current
>>>>> incore
>>>>>>> estimator( 2.0e-8). My question to the list, is how important is this
>>>>> diff?
>>>>>>> Is it worth tearing things apart to figure out where the error is
>>>>>>> accumulating?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -Greg
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
I will run against R.

Here is the official repository @ NIST for Wampler/Longley/Filippelli data..

http://www.itl.nist.gov/div898/strd/lls/lls.shtml

If you follow the link, the ASCII data files also have the certified
results.

Would you like me to add these tests to the unit test for
OLSMultipleLinearRegression?

On Tue, Jul 12, 2011 at 12:52 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 7/11/11 9:34 PM, Greg Sterijevski wrote:
> > I also ran the filipelli data through both the regression technique that
> I
> > am working on, and the current multiple regression package. My work in
> > progress gets estimates which though not great are close to the certified
> > values. OLSMultipleLinearRegression exceptions out, complaining about a
> > singular matrix.
>
> I assume the design matrix is near-singular, correct?  Where did the
> certified values come from?  If you have access to R, it would be
> good to compare results against R as well.  There is R code in
> src/test/R set up to compare results against [math].
> >
> > On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
> > <gs...@gmail.com>wrote:
> >
> >> Yes, my apologies. I am a bit new to this.
> >>
> >>
> >>
> >> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <flamefew@gmail.com
> >wrote:
> >>
> >>> I'm assuming this is Commons Math. I've added a [math] so it catches
> >>> the interest of those involved.
> >>>
> >>>
> >>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
> >>> <gs...@gmail.com> wrote:
> >>>> Additionally, I pass all of the Wampler beta estimates.
> >>>>
> >>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> >>>> <gs...@gmail.com>wrote:
> >>>>
> >>>>> Hello All,
> >>>>>
> >>>>> I am testing the first 'updating' ols regression algorithm. I ran it
> >>>>> through the Wampler1 data. It gets 1.0s for all of the beta
> estimates.
> >>> I
> >>>>> next ran the Longley dataset. I match, but with a tolerance of
> 1.0e-6.
> >>> This
> >>>>> is a bit less than two orders of magnitude worse than the current
> >>> incore
> >>>>> estimator( 2.0e-8). My question to the list, is how important is this
> >>> diff?
> >>>>> Is it worth tearing things apart to figure out where the error is
> >>>>> accumulating?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> -Greg
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Re: Longley Data

Posted by Phil Steitz <ph...@gmail.com>.
On 7/11/11 9:34 PM, Greg Sterijevski wrote:
> I also ran the filipelli data through both the regression technique that I
> am working on, and the current multiple regression package. My work in
> progress gets estimates which though not great are close to the certified
> values. OLSMultipleLinearRegression exceptions out, complaining about a
> singular matrix.

I assume the design matrix is near-singular, correct?  Where did the
certified values come from?  If you have access to R, it would be
good to compare results against R as well.  There is R code in
src/test/R set up to compare results against [math].
>
> On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
> <gs...@gmail.com>wrote:
>
>> Yes, my apologies. I am a bit new to this.
>>
>>
>>
>> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <fl...@gmail.com>wrote:
>>
>>> I'm assuming this is Commons Math. I've added a [math] so it catches
>>> the interest of those involved.
>>>
>>>
>>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
>>> <gs...@gmail.com> wrote:
>>>> Additionally, I pass all of the Wampler beta estimates.
>>>>
>>>> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
>>>> <gs...@gmail.com>wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I am testing the first 'updating' ols regression algorithm. I ran it
>>>>> through the Wampler1 data. It gets 1.0s for all of the beta estimates.
>>> I
>>>>> next ran the Longley dataset. I match, but with a tolerance of 1.0e-6.
>>> This
>>>>> is a bit less than two orders of magnitude worse than the current
>>> incore
>>>>> estimator( 2.0e-8). My question to the list, is how important is this
>>> diff?
>>>>> Is it worth tearing things apart to figure out where the error is
>>>>> accumulating?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Greg
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
I also ran the filipelli data through both the regression technique that I
am working on, and the current multiple regression package. My work in
progress gets estimates which though not great are close to the certified
values. OLSMultipleLinearRegression exceptions out, complaining about a
singular matrix.


On Mon, Jul 11, 2011 at 11:07 PM, Greg Sterijevski
<gs...@gmail.com>wrote:

> Yes, my apologies. I am a bit new to this.
>
>
>
> On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <fl...@gmail.com>wrote:
>
>> I'm assuming this is Commons Math. I've added a [math] so it catches
>> the interest of those involved.
>>
>>
>> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
>> <gs...@gmail.com> wrote:
>> > Additionally, I pass all of the Wampler beta estimates.
>> >
>> > On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
>> > <gs...@gmail.com>wrote:
>> >
>> >> Hello All,
>> >>
>> >> I am testing the first 'updating' ols regression algorithm. I ran it
>> >> through the Wampler1 data. It gets 1.0s for all of the beta estimates.
>> I
>> >> next ran the Longley dataset. I match, but with a tolerance of 1.0e-6.
>> This
>> >> is a bit less than two orders of magnitude worse than the current
>> incore
>> >> estimator( 2.0e-8). My question to the list, is how important is this
>> diff?
>> >> Is it worth tearing things apart to figure out where the error is
>> >> accumulating?
>> >>
>> >> Thanks,
>> >>
>> >> -Greg
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: [math] Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
Yes, my apologies. I am a bit new to this.


On Mon, Jul 11, 2011 at 10:59 PM, Henri Yandell <fl...@gmail.com> wrote:

> I'm assuming this is Commons Math. I've added a [math] so it catches
> the interest of those involved.
>
>
> On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
> <gs...@gmail.com> wrote:
> > Additionally, I pass all of the Wampler beta estimates.
> >
> > On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> > <gs...@gmail.com>wrote:
> >
> >> Hello All,
> >>
> >> I am testing the first 'updating' ols regression algorithm. I ran it
> >> through the Wampler1 data. It gets 1.0s for all of the beta estimates. I
> >> next ran the Longley dataset. I match, but with a tolerance of 1.0e-6.
> This
> >> is a bit less than two orders of magnitude worse than the current incore
> >> estimator( 2.0e-8). My question to the list, is how important is this
> diff?
> >> Is it worth tearing things apart to figure out where the error is
> >> accumulating?
> >>
> >> Thanks,
> >>
> >> -Greg
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

[math] Re: Longley Data

Posted by Henri Yandell <fl...@gmail.com>.
I'm assuming this is Commons Math. I've added a [math] so it catches
the interest of those involved.


On Mon, Jul 11, 2011 at 8:52 PM, Greg Sterijevski
<gs...@gmail.com> wrote:
> Additionally, I pass all of the Wampler beta estimates.
>
> On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
> <gs...@gmail.com>wrote:
>
>> Hello All,
>>
>> I am testing the first 'updating' ols regression algorithm. I ran it
>> through the Wampler1 data. It gets 1.0s for all of the beta estimates. I
>> next ran the Longley dataset. I match, but with a tolerance of 1.0e-6. This
>> is a bit less than two orders of magnitude worse than the current incore
>> estimator( 2.0e-8). My question to the list, is how important is this diff?
>> Is it worth tearing things apart to figure out where the error is
>> accumulating?
>>
>> Thanks,
>>
>> -Greg
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Longley Data

Posted by Greg Sterijevski <gs...@gmail.com>.
Additionally, I pass all of the Wampler beta estimates.

On Mon, Jul 11, 2011 at 10:40 PM, Greg Sterijevski
<gs...@gmail.com>wrote:

> Hello All,
>
> I am testing the first 'updating' ols regression algorithm. I ran it
> through the Wampler1 data. It gets 1.0s for all of the beta estimates. I
> next ran the Longley dataset. I match, but with a tolerance of 1.0e-6. This
> is a bit less than two orders of magnitude worse than the current incore
> estimator( 2.0e-8). My question to the list, is how important is this diff?
> Is it worth tearing things apart to figure out where the error is
> accumulating?
>
> Thanks,
>
> -Greg
>