You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Eric Barnhill <er...@gmail.com> on 2019/07/18 22:49:52 UTC

[statistics] Proposed OLS grammar

I suggested the following grammar to aim for in our meeting today with the
developing OLS module. If you see anything you'd prefer to change let's
establish it now , if anyone doesn't like it later, it's on me.

RegressionData data = RegressionDataLoader.of(double[][] y, double[] x);
Regression ols = new OLSRegression();
RegressionResults results = ols.regress(data);
betas = results.getBetas() ;

where:
RegressionData is an interface
RegressionDataLoader is a factory class and of() a (possibly overloaded)
static method
Regression is an interface, implemented by OLSRegression
RegressionResults is an interface, the specific class returned is
OLSResults which implements it.
betas are the intercept and slopes of the regression model

I think this preserves abstraction at the levels desired, since we will
want in future flexibility as to regression type, posslble state parameters
set on the regression object, and results contents and format. But also
doesn't take on any unnecessary abstractions.

Eric

Re: [statistics] Proposed OLS grammar

Posted by Gilles Sadowski <gi...@gmail.com>.
https://www.baeldung.com/java-inner-interfaces

interface Regression {
    interface Data {
        // ...
    }
    interface Result {
        // ...
    }
}

Le ven. 19 juil. 2019 à 01:20, Alex Herbert <al...@gmail.com> a écrit :
>
>
>
> > On 18 Jul 2019, at 23:49, Eric Barnhill <er...@gmail.com> wrote:
> >
> > I suggested the following grammar to aim for in our meeting today with the
> > developing OLS module. If you see anything you'd prefer to change let's
> > establish it now , if anyone doesn't like it later, it's on me.
> >
> > RegressionData data = RegressionDataLoader.of(double[][] y, double[] x);
>
> Maybe:
>
> RegressionData data = Regressions.newRegressionData(double[][] y, double[] x);
> RegressionData data = Regressions.newData(double[][] y, double[] x);
>
> ?
>
> Note that the idea that either the x or y of the RegressionData may want to be changed was also raised. This could use a Builder pattern:
>
> RegressionData data = Regressions.newDataBuilder()
>                                  .setY(double[][] y)
>                                  .setX(double[] x)
>                                  .build();
>
> Or some variant of. Something for the future.
>
>
> > Regression ols = new OLSRegression();
> > RegressionResults results = ols.regress(data);
> > betas = results.getBetas() ;
> >
> > where:
> > RegressionData is an interface
> > RegressionDataLoader is a factory class and of() a (possibly overloaded)
> > static method
> > Regression is an interface, implemented by OLSRegression
> > RegressionResults is an interface, the specific class returned is
> > OLSResults which implements it.
> > betas are the intercept and slopes of the regression model
> >
> > I think this preserves abstraction at the levels desired, since we will
> > want in future flexibility as to regression type, posslble state parameters
> > set on the regression object, and results contents and format. But also
> > doesn't take on any unnecessary abstractions.
> >
> > Eric
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [statistics] Proposed OLS grammar

Posted by Alex Herbert <al...@gmail.com>.

> On 18 Jul 2019, at 23:49, Eric Barnhill <er...@gmail.com> wrote:
> 
> I suggested the following grammar to aim for in our meeting today with the
> developing OLS module. If you see anything you'd prefer to change let's
> establish it now , if anyone doesn't like it later, it's on me.
> 
> RegressionData data = RegressionDataLoader.of(double[][] y, double[] x);

Maybe:

RegressionData data = Regressions.newRegressionData(double[][] y, double[] x);
RegressionData data = Regressions.newData(double[][] y, double[] x);

?

Note that the idea that either the x or y of the RegressionData may want to be changed was also raised. This could use a Builder pattern:

RegressionData data = Regressions.newDataBuilder()
                                 .setY(double[][] y)
                                 .setX(double[] x)
                                 .build();

Or some variant of. Something for the future.


> Regression ols = new OLSRegression();
> RegressionResults results = ols.regress(data);
> betas = results.getBetas() ;
> 
> where:
> RegressionData is an interface
> RegressionDataLoader is a factory class and of() a (possibly overloaded)
> static method
> Regression is an interface, implemented by OLSRegression
> RegressionResults is an interface, the specific class returned is
> OLSResults which implements it.
> betas are the intercept and slopes of the regression model
> 
> I think this preserves abstraction at the levels desired, since we will
> want in future flexibility as to regression type, posslble state parameters
> set on the regression object, and results contents and format. But also
> doesn't take on any unnecessary abstractions.
> 
> Eric




RE: [statistics] Proposed OLS grammar

Posted by Ben Nguyen <be...@gmail.com>.
Hello Dr. Paul King,

I am working on the new regression module for Commons Statistics as a student in GSoC. I had a brief look at your Groovy Data Science (which I will have to look at more deeply in the future because it’s an interesting and high-quality tutorial/showcase), and noticed that in your slides you mentioned the 7 main types of regression. One of the central purposes of this new Commons Statistics Regression component is to design an architecture which can support these different types by allowing a good base for other developers to append more regression types beyond just OLS and GLS in math3.

Currently I’m trying to design for this purpose, using OLS as a starting base and EJML for matrix operations (instead of math3.linear). The plan is to have OLS, GLS and Logistic done by around end of August, and adding other regression types in the future, hopefully with other developers. 
The updating regressions like SimpleRegression you’ve used will likely stay as is for now unless you have suggestions for them?

I also wanted to take this opportunity to as you as a user:
1. What would make your life easier?
2. What features should definitely be kept?
a. Do you value the current data input interface (with just newSampleData() directly from OLS class)?
b. Or would you consider some of the others mentioned which is needed if using the same loaded data in different types of regression is important?
3. What features should be improved?
a. Would you consider the current running time sufficient or is it restrictive for you in any way? (hopefully EJML helped bit in that regard – perhaps benchmarks will be made after OLS is done)
4. Any suggestions/requests for specific features?
a. Perhaps a summary printout under a RegressionResults interface?

Thank you for your time, I appreciate any input you can give me.

Cheers,
-Ben Nguyen

From: Paul King
Sent: Friday, July 19, 2019 6:26 AM
To: Commons Developers List
Subject: Re: [statistics] Proposed OLS grammar

There are about 10 files using classes from the math3.stat package in
the examples I mentioned. I have stayed away from math4 while it's
still snapshot.

Repo: https://github.com/paulk-asert/groovy-data-science

Slides: https://speakerdeck.com/paulk/groovy-data-science

Most of the examples are in the subprojects/HousePrices project with a
few others just using StatUtil.

It's not my full-time day job to be using those classes but I'd be
keen to have those examples working nicely.

Cheers, Paul.

On Fri, Jul 19, 2019 at 9:11 PM Gilles Sadowski <gi...@gmail.com> wrote:
>
> Hi.
>
> Your experience as a user of "Commons Math" would be most useful
> to help us craft a better (or, at least, no worse) design for "Commons
> Statistics".
> Would you share pointers to actual use-cases?
>
> Thanks,
> Gilles
>
> 2019-07-19 7:03 UTC+02:00, Paul King <pa...@gmail.com>:
> > Cool. I'd be keen to try out the API, when you are ready, in my
> > "Apache Groovy for data science" examples which currently use the
> > commons math3 classes.
> >
> > Cheers, Paul.
> >
> > On Fri, Jul 19, 2019 at 9:51 AM Gilles Sadowski <gi...@gmail.com>
> > wrote:
> >>
> >> Hi.
> >>
> >> Le ven. 19 juil. 2019 à 01:45, Paul King <pa...@gmail.com> a
> >> écrit :
> >> >
> >> > How does this relate to the OLS classes in commons math?
> >> > https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html
> >>
> >> The new "Commons Statistics" component purports to replace the
> >> functionality
> >> currently defined in the package "org.apache.commons.math4.stat" of
> >> "Commons
> >> Math.
> >>
> >> Regards,
> >> Gilles
> >>
> >> > On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <er...@gmail.com>
> >> > wrote:
> >> > >
> >> > > I suggested the following grammar to aim for in our meeting today with
> >> > > the
> >> > > developing OLS module. If you see anything you'd prefer to change
> >> > > let's
> >> > > establish it now , if anyone doesn't like it later, it's on me.
> >> > >
> >> > > RegressionData data = RegressionDataLoader.of(double[][] y, double[]
> >> > > x);
> >> > > Regression ols = new OLSRegression();
> >> > > RegressionResults results = ols.regress(data);
> >> > > betas = results.getBetas() ;
> >> > >
> >> > > where:
> >> > > RegressionData is an interface
> >> > > RegressionDataLoader is a factory class and of() a (possibly
> >> > > overloaded)
> >> > > static method
> >> > > Regression is an interface, implemented by OLSRegression
> >> > > RegressionResults is an interface, the specific class returned is
> >> > > OLSResults which implements it.
> >> > > betas are the intercept and slopes of the regression model
> >> > >
> >> > > I think this preserves abstraction at the levels desired, since we
> >> > > will
> >> > > want in future flexibility as to regression type, posslble state
> >> > > parameters
> >> > > set on the regression object, and results contents and format. But
> >> > > also
> >> > > doesn't take on any unnecessary abstractions.
> >> > >
> >> > > Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



Re: [statistics] Proposed OLS grammar

Posted by Paul King <pa...@gmail.com>.
There are about 10 files using classes from the math3.stat package in
the examples I mentioned. I have stayed away from math4 while it's
still snapshot.

Repo: https://github.com/paulk-asert/groovy-data-science

Slides: https://speakerdeck.com/paulk/groovy-data-science

Most of the examples are in the subprojects/HousePrices project with a
few others just using StatUtil.

It's not my full-time day job to be using those classes but I'd be
keen to have those examples working nicely.

Cheers, Paul.

On Fri, Jul 19, 2019 at 9:11 PM Gilles Sadowski <gi...@gmail.com> wrote:
>
> Hi.
>
> Your experience as a user of "Commons Math" would be most useful
> to help us craft a better (or, at least, no worse) design for "Commons
> Statistics".
> Would you share pointers to actual use-cases?
>
> Thanks,
> Gilles
>
> 2019-07-19 7:03 UTC+02:00, Paul King <pa...@gmail.com>:
> > Cool. I'd be keen to try out the API, when you are ready, in my
> > "Apache Groovy for data science" examples which currently use the
> > commons math3 classes.
> >
> > Cheers, Paul.
> >
> > On Fri, Jul 19, 2019 at 9:51 AM Gilles Sadowski <gi...@gmail.com>
> > wrote:
> >>
> >> Hi.
> >>
> >> Le ven. 19 juil. 2019 à 01:45, Paul King <pa...@gmail.com> a
> >> écrit :
> >> >
> >> > How does this relate to the OLS classes in commons math?
> >> > https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html
> >>
> >> The new "Commons Statistics" component purports to replace the
> >> functionality
> >> currently defined in the package "org.apache.commons.math4.stat" of
> >> "Commons
> >> Math.
> >>
> >> Regards,
> >> Gilles
> >>
> >> > On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <er...@gmail.com>
> >> > wrote:
> >> > >
> >> > > I suggested the following grammar to aim for in our meeting today with
> >> > > the
> >> > > developing OLS module. If you see anything you'd prefer to change
> >> > > let's
> >> > > establish it now , if anyone doesn't like it later, it's on me.
> >> > >
> >> > > RegressionData data = RegressionDataLoader.of(double[][] y, double[]
> >> > > x);
> >> > > Regression ols = new OLSRegression();
> >> > > RegressionResults results = ols.regress(data);
> >> > > betas = results.getBetas() ;
> >> > >
> >> > > where:
> >> > > RegressionData is an interface
> >> > > RegressionDataLoader is a factory class and of() a (possibly
> >> > > overloaded)
> >> > > static method
> >> > > Regression is an interface, implemented by OLSRegression
> >> > > RegressionResults is an interface, the specific class returned is
> >> > > OLSResults which implements it.
> >> > > betas are the intercept and slopes of the regression model
> >> > >
> >> > > I think this preserves abstraction at the levels desired, since we
> >> > > will
> >> > > want in future flexibility as to regression type, posslble state
> >> > > parameters
> >> > > set on the regression object, and results contents and format. But
> >> > > also
> >> > > doesn't take on any unnecessary abstractions.
> >> > >
> >> > > Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [statistics] Proposed OLS grammar

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi.

Your experience as a user of "Commons Math" would be most useful
to help us craft a better (or, at least, no worse) design for "Commons
Statistics".
Would you share pointers to actual use-cases?

Thanks,
Gilles

2019-07-19 7:03 UTC+02:00, Paul King <pa...@gmail.com>:
> Cool. I'd be keen to try out the API, when you are ready, in my
> "Apache Groovy for data science" examples which currently use the
> commons math3 classes.
>
> Cheers, Paul.
>
> On Fri, Jul 19, 2019 at 9:51 AM Gilles Sadowski <gi...@gmail.com>
> wrote:
>>
>> Hi.
>>
>> Le ven. 19 juil. 2019 à 01:45, Paul King <pa...@gmail.com> a
>> écrit :
>> >
>> > How does this relate to the OLS classes in commons math?
>> > https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html
>>
>> The new "Commons Statistics" component purports to replace the
>> functionality
>> currently defined in the package "org.apache.commons.math4.stat" of
>> "Commons
>> Math.
>>
>> Regards,
>> Gilles
>>
>> > On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <er...@gmail.com>
>> > wrote:
>> > >
>> > > I suggested the following grammar to aim for in our meeting today with
>> > > the
>> > > developing OLS module. If you see anything you'd prefer to change
>> > > let's
>> > > establish it now , if anyone doesn't like it later, it's on me.
>> > >
>> > > RegressionData data = RegressionDataLoader.of(double[][] y, double[]
>> > > x);
>> > > Regression ols = new OLSRegression();
>> > > RegressionResults results = ols.regress(data);
>> > > betas = results.getBetas() ;
>> > >
>> > > where:
>> > > RegressionData is an interface
>> > > RegressionDataLoader is a factory class and of() a (possibly
>> > > overloaded)
>> > > static method
>> > > Regression is an interface, implemented by OLSRegression
>> > > RegressionResults is an interface, the specific class returned is
>> > > OLSResults which implements it.
>> > > betas are the intercept and slopes of the regression model
>> > >
>> > > I think this preserves abstraction at the levels desired, since we
>> > > will
>> > > want in future flexibility as to regression type, posslble state
>> > > parameters
>> > > set on the regression object, and results contents and format. But
>> > > also
>> > > doesn't take on any unnecessary abstractions.
>> > >
>> > > Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [statistics] Proposed OLS grammar

Posted by Paul King <pa...@gmail.com>.
Cool. I'd be keen to try out the API, when you are ready, in my
"Apache Groovy for data science" examples which currently use the
commons math3 classes.

Cheers, Paul.

On Fri, Jul 19, 2019 at 9:51 AM Gilles Sadowski <gi...@gmail.com> wrote:
>
> Hi.
>
> Le ven. 19 juil. 2019 à 01:45, Paul King <pa...@gmail.com> a écrit :
> >
> > How does this relate to the OLS classes in commons math?
> > https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html
>
> The new "Commons Statistics" component purports to replace the functionality
> currently defined in the package "org.apache.commons.math4.stat" of "Commons
> Math.
>
> Regards,
> Gilles
>
> > On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <er...@gmail.com> wrote:
> > >
> > > I suggested the following grammar to aim for in our meeting today with the
> > > developing OLS module. If you see anything you'd prefer to change let's
> > > establish it now , if anyone doesn't like it later, it's on me.
> > >
> > > RegressionData data = RegressionDataLoader.of(double[][] y, double[] x);
> > > Regression ols = new OLSRegression();
> > > RegressionResults results = ols.regress(data);
> > > betas = results.getBetas() ;
> > >
> > > where:
> > > RegressionData is an interface
> > > RegressionDataLoader is a factory class and of() a (possibly overloaded)
> > > static method
> > > Regression is an interface, implemented by OLSRegression
> > > RegressionResults is an interface, the specific class returned is
> > > OLSResults which implements it.
> > > betas are the intercept and slopes of the regression model
> > >
> > > I think this preserves abstraction at the levels desired, since we will
> > > want in future flexibility as to regression type, posslble state parameters
> > > set on the regression object, and results contents and format. But also
> > > doesn't take on any unnecessary abstractions.
> > >
> > > Eric
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [statistics] Proposed OLS grammar

Posted by Gilles Sadowski <gi...@gmail.com>.
Hi.

Le ven. 19 juil. 2019 à 01:45, Paul King <pa...@gmail.com> a écrit :
>
> How does this relate to the OLS classes in commons math?
> https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html

The new "Commons Statistics" component purports to replace the functionality
currently defined in the package "org.apache.commons.math4.stat" of "Commons
Math.

Regards,
Gilles

> On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <er...@gmail.com> wrote:
> >
> > I suggested the following grammar to aim for in our meeting today with the
> > developing OLS module. If you see anything you'd prefer to change let's
> > establish it now , if anyone doesn't like it later, it's on me.
> >
> > RegressionData data = RegressionDataLoader.of(double[][] y, double[] x);
> > Regression ols = new OLSRegression();
> > RegressionResults results = ols.regress(data);
> > betas = results.getBetas() ;
> >
> > where:
> > RegressionData is an interface
> > RegressionDataLoader is a factory class and of() a (possibly overloaded)
> > static method
> > Regression is an interface, implemented by OLSRegression
> > RegressionResults is an interface, the specific class returned is
> > OLSResults which implements it.
> > betas are the intercept and slopes of the regression model
> >
> > I think this preserves abstraction at the levels desired, since we will
> > want in future flexibility as to regression type, posslble state parameters
> > set on the regression object, and results contents and format. But also
> > doesn't take on any unnecessary abstractions.
> >
> > Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [statistics] Proposed OLS grammar

Posted by Paul King <pa...@gmail.com>.
How does this relate to the OLS classes in commons math?
https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html

On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <er...@gmail.com> wrote:
>
> I suggested the following grammar to aim for in our meeting today with the
> developing OLS module. If you see anything you'd prefer to change let's
> establish it now , if anyone doesn't like it later, it's on me.
>
> RegressionData data = RegressionDataLoader.of(double[][] y, double[] x);
> Regression ols = new OLSRegression();
> RegressionResults results = ols.regress(data);
> betas = results.getBetas() ;
>
> where:
> RegressionData is an interface
> RegressionDataLoader is a factory class and of() a (possibly overloaded)
> static method
> Regression is an interface, implemented by OLSRegression
> RegressionResults is an interface, the specific class returned is
> OLSResults which implements it.
> betas are the intercept and slopes of the regression model
>
> I think this preserves abstraction at the levels desired, since we will
> want in future flexibility as to regression type, posslble state parameters
> set on the regression object, and results contents and format. But also
> doesn't take on any unnecessary abstractions.
>
> Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org