You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Luc Maisonobe <Lu...@free.fr> on 2009/02/26 20:34:37 UTC

[math] redesigning the optimization/estimation packages (phase 2)

I have just checked in the first changes as discussed on the list.

The changes are merely code moved from one package to another. The new
packages are optimization.direct, optimization.general,
optimization.linear (empty for now) and optimization.univariate. The
removed packages are analysis.minimization (moved to
optimization.univariate) and estimation (moved to optimization.general).

I have also changed all class names containing Cost into Objective, as
this is more logical for optimizers supporting both minimization and
maximization.

A few new interfaces, much simpler than the previous ones have been set
up. Please take a look at them and tell what you feel about them. The
proposed design is to have the user implement either ObjectiveFunction
(for scalar functions) or MultiObjectiveFunction (for vectorial
functions, for example residuals). This function would then be passed to
a class implementing Optimizer. This is all for the simple use.

For advanced use, a few other tools are available.

The first tool is LestSquaresConverter which can convert a
MultiObjectiveFunction representing residuals into a simpler
ObjectiveFunction. I have taken into account both Gilles and Ted
remarks, the conversion can use no weights at all, vector weights or a
complete matrix representing both weights and correlations.

Another tool will be a MultiStartManager which will extract the
multi-start features from DirectSearchOptimizer and generalize them to
be used with any kind of optimizer. I'm not sure yet how to design it.
Perhaps I will set up an interface InitialPointGenerator that will
generate either an initial simplex for direct search methods or a single
start point for gradient methods. What do you think about it ?

There are some rough edges now. For example the ConvergenceChecker
interface is still simplex oriented, it should be generalized.

Could someone review all this ?

Thanks,
Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Luc Maisonobe <Lu...@free.fr>.

Ted Dunning a écrit :
> On Sat, Feb 28, 2009 at 6:52 AM, Phil Steitz <ph...@gmail.com> wrote:
> 
>> What is going on here is that the special case - linear regression analysis
>> - is being discussed as an example of an optimization problem.
>>
> 
> Yes.  It is a special case.
> 
> But it was germane to the topic, particularly the imposition of a particular
> way of converting a residual vector to a quantity to be optimized.  Mostly I
> was worried that the API structure for optimization would force itself on
> the higher level API's.

The LeastSquaresConverter that started the discussion is only a small
helper class. It is not the basis of the general API and can be removed
if it is too awkward.

Please take a look at what I have just checked in. I have improved
slightly the general interfaces (mainly ConvergenceChecker and
Optimizer). I have also adapted DirectSearchOptimizer, NelderMead and
MultiDirectional to implement the new API. The multi-start feature that
was previously only available in direct search optimizers has been
extracted and is now a wrapper for other optimizers, regardless of their
types.

The optimization.general package (former estimation) is the next task.

Luc

> 
> 
>> ...  Patches welcome!
>>
> 
> Point taken.  I apologize for not providing more them.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Ted Dunning <te...@gmail.com>.

On Sat, Feb 28, 2009 at 6:52 AM, Phil Steitz <ph...@gmail.com> wrote:

> What is going on here is that the special case - linear regression analysis
> - is being discussed as an example of an optimization problem.
>

Yes.  It is a special case.

But it was germane to the topic, particularly the imposition of a particular
way of converting a residual vector to a quantity to be optimized.  Mostly I
was worried that the API structure for optimization would force itself on
the higher level API's.

> ...  Patches welcome!
>

Point taken.  I apologize for not providing more them.

-- 
Ted Dunning, CTO
DeepDyve

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Phil Steitz <ph...@gmail.com>.

Luc Maisonobe wrote:
> I'm afraid what I propose is far more basic than what you need. I hardly
> understand the concepts you use in your message (I'm not a statistician
> at all, I work in space flight dynamics). I assume we use the same words
> with slightly different meanings in mind (models, residuals,
> observations, variables ...).
>
> I'm clearly unable to fulfill your requirements.
>   
What is going on here is that the special case - linear regression 
analysis - is being discussed as an example of an optimization problem.  
This is correct mathematically, but from the API standpoint it violates  
a principle that I would like to see us stick to,  which is to make it 
as easy as possible to find solutions to practical programming problems 
using commons-math.  From this standpoint, it is clear that to estimate 
linear models, one should look in the stat.regression package.  The API 
and functionality there is currently primitive.  It is going to have to 
be cleaned up and extended to be releasable, IMO.  Suggestions for 
improvement - ideally with patches - are most welcome.  Just as the 
implementations there use classes from the linear package,  it may turn 
out to be useful (not having designed or written any code, I can't say 
for sure) to use implementations from the optimization package for some 
not-yet-implemented features.  But the user-facing API and the user 
guide should point to the stat package.  This is going to happen more 
and more as the overall library becomes more powerful - there will be 
multiple ways to use commons-math to solve problems.  We need to have a 
simple, minimally complex (from API and math concepts required 
standpoint) solution path present itself to the user when browsing the 
javadoc or reading the user guide looking for solutions to common 
practical problems.  That means from the API standpoint we break out 
common use cases rather than trying to build a pseudo-programming 
language or overly abstract API.

While I love R,  I would hardly describe it as easy to use.  The goal of 
a good library API in my opinion is not just to minimize the number of 
lines of code required, as this can lead to inscurtable code (like many 
R programs). A decent java API with good javadoc that focusses on 
solving the most common practical problems is what we are after in 
commons-math.  We certainly have much room for improvement and that is 
what we are trying to do in 2.0.   Thanks in advance to all for feedback 
and suggestions for improvement.  Patches welcome!

Phil
> Does this imply we continue with only revamping the basic setup that was
> imported from mantissa for 1.2 in order to have a simplistic but
> consistent framework ? Does this imply we get rid of this limited view
> and take the opportunity of version 2.0 to build a complete
> statistically sound estimation package ?
>
> I can do the former (taking into account the residuals computation
> issues, and probably slightly changing the meaning of the weights to
> compute wi*(ri^2) rather than (wi*ri)^2 as done now). I cannot do the
> latter.
>
> Luc
>
> Ted Dunning wrote :
>   
>> Here is what I would do in R to do a linear model with observations
>> consisting of input variables x1 ... x3 and target variable y:
>>
>>      m <- lm(y ~ x1 + x2 + x3, data)
>>
>> The first argument lets me supply the form of the regression.  I could have
>> regressed on log(x3), for instance, or used x1*x2 + x3 to include x1, x2, x3
>> and the x1:x2 interaction or (x1+x2+x3)^2.
>>
>> To do logistic regression, I would switch to glm (for generalized linear
>> model):
>>
>>      m <- glm(y ~ x1 + x2 + x3, data, family=binomial())
>>
>> To do Bayesian logistic regression, I would use bayesglm from the arm
>> library:
>>
>>     # this happens once to install the package
>>     install.packages("arm");
>>     # this happens once per session
>>     library(arm)
>>     # and this gives me my model
>>     m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial())
>>
>> To give weights to observations using the k column of the data:
>>
>>     m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial(), weights=data$k)
>>
>> If I want to compute the posterior distribution of outputs, I can use
>> sims(m)$coefs to get a bunch of samples from the coefficient posterior and
>> multiply them by the observation matrix in one or two (fairly simple) lines
>> of code.  If I just want outputs for MAP model estimates, then I do
>>
>>     ytwiddle = predict(m, newdata)
>>
>> This works for any of the models above.
>>
>> If I could do anything like this using commons math in less than 10x as many
>> lines of code, I would be thrilled.
>>
>> Some notes about the user experience here:
>>
>> a) I don't have to care about residuals, but if I do care, they are in the
>> resulting model structure and I can look at them.  If I do plot(m) I get a
>> bunch of interesting diagnostic plots.  If I do coefplot(m), I get my
>> coefficients with error bars.
>>
>> b) I don't care *how* my weights affect the algorithm.  It just works.  I
>> can dig into the guts in various ways, but I don't have to.
>>
>> c) I can effectively subset my data column-wise without creating new
>> matrices.  I can also change the structure of my model all over the place
>> very easily without changing the data.  This is (was) a controversial aspect
>> of R's modeling structure and it makes it really hard to find logistic
>> regression in the index of a book because the libraries are oriented around
>> the generalized linear modeling view of the world (which isn't so bad after
>> you get going with it).
>>
>> d) lots of diagnostic info gets embedded into the resulting model.
>>
>>
>>
>>
>> On Thu, Feb 26, 2009 at 12:14 PM, Luc Maisonobe <Lu...@free.fr>wrote:
>>
>>     
>>> This way, the residual would not be seen and only the
>>> pure observation vectors should be provided by user.
>>>
>>>       
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Luc Maisonobe <Lu...@free.fr>.

I'm afraid what I propose is far more basic than what you need. I hardly
understand the concepts you use in your message (I'm not a statistician
at all, I work in space flight dynamics). I assume we use the same words
with slightly different meanings in mind (models, residuals,
observations, variables ...).

I'm clearly unable to fulfill your requirements.

Does this imply we continue with only revamping the basic setup that was
imported from mantissa for 1.2 in order to have a simplistic but
consistent framework ? Does this imply we get rid of this limited view
and take the opportunity of version 2.0 to build a complete
statistically sound estimation package ?

I can do the former (taking into account the residuals computation
issues, and probably slightly changing the meaning of the weights to
compute wi*(ri^2) rather than (wi*ri)^2 as done now). I cannot do the
latter.

Luc

Ted Dunning wrote :
> Here is what I would do in R to do a linear model with observations
> consisting of input variables x1 ... x3 and target variable y:
> 
>      m <- lm(y ~ x1 + x2 + x3, data)
> 
> The first argument lets me supply the form of the regression.  I could have
> regressed on log(x3), for instance, or used x1*x2 + x3 to include x1, x2, x3
> and the x1:x2 interaction or (x1+x2+x3)^2.
> 
> To do logistic regression, I would switch to glm (for generalized linear
> model):
> 
>      m <- glm(y ~ x1 + x2 + x3, data, family=binomial())
> 
> To do Bayesian logistic regression, I would use bayesglm from the arm
> library:
> 
>     # this happens once to install the package
>     install.packages("arm");
>     # this happens once per session
>     library(arm)
>     # and this gives me my model
>     m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial())
> 
> To give weights to observations using the k column of the data:
> 
>     m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial(), weights=data$k)
> 
> If I want to compute the posterior distribution of outputs, I can use
> sims(m)$coefs to get a bunch of samples from the coefficient posterior and
> multiply them by the observation matrix in one or two (fairly simple) lines
> of code.  If I just want outputs for MAP model estimates, then I do
> 
>     ytwiddle = predict(m, newdata)
> 
> This works for any of the models above.
> 
> If I could do anything like this using commons math in less than 10x as many
> lines of code, I would be thrilled.
> 
> Some notes about the user experience here:
> 
> a) I don't have to care about residuals, but if I do care, they are in the
> resulting model structure and I can look at them.  If I do plot(m) I get a
> bunch of interesting diagnostic plots.  If I do coefplot(m), I get my
> coefficients with error bars.
> 
> b) I don't care *how* my weights affect the algorithm.  It just works.  I
> can dig into the guts in various ways, but I don't have to.
> 
> c) I can effectively subset my data column-wise without creating new
> matrices.  I can also change the structure of my model all over the place
> very easily without changing the data.  This is (was) a controversial aspect
> of R's modeling structure and it makes it really hard to find logistic
> regression in the index of a book because the libraries are oriented around
> the generalized linear modeling view of the world (which isn't so bad after
> you get going with it).
> 
> d) lots of diagnostic info gets embedded into the resulting model.
> 
> 
> 
> 
> On Thu, Feb 26, 2009 at 12:14 PM, Luc Maisonobe <Lu...@free.fr>wrote:
> 
>> This way, the residual would not be seen and only the
>> pure observation vectors should be provided by user.
>>
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Ted Dunning <te...@gmail.com>.

Here is what I would do in R to do a linear model with observations
consisting of input variables x1 ... x3 and target variable y:

     m <- lm(y ~ x1 + x2 + x3, data)

The first argument lets me supply the form of the regression.  I could have
regressed on log(x3), for instance, or used x1*x2 + x3 to include x1, x2, x3
and the x1:x2 interaction or (x1+x2+x3)^2.

To do logistic regression, I would switch to glm (for generalized linear
model):

     m <- glm(y ~ x1 + x2 + x3, data, family=binomial())

To do Bayesian logistic regression, I would use bayesglm from the arm
library:

    # this happens once to install the package
    install.packages("arm");
    # this happens once per session
    library(arm)
    # and this gives me my model
    m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial())

To give weights to observations using the k column of the data:

    m <- bayesglm(y ~ x1 + x2 + x3, data, family=binomial(), weights=data$k)

If I want to compute the posterior distribution of outputs, I can use
sims(m)$coefs to get a bunch of samples from the coefficient posterior and
multiply them by the observation matrix in one or two (fairly simple) lines
of code.  If I just want outputs for MAP model estimates, then I do

    ytwiddle = predict(m, newdata)

This works for any of the models above.

If I could do anything like this using commons math in less than 10x as many
lines of code, I would be thrilled.

Some notes about the user experience here:

a) I don't have to care about residuals, but if I do care, they are in the
resulting model structure and I can look at them.  If I do plot(m) I get a
bunch of interesting diagnostic plots.  If I do coefplot(m), I get my
coefficients with error bars.

b) I don't care *how* my weights affect the algorithm.  It just works.  I
can dig into the guts in various ways, but I don't have to.

c) I can effectively subset my data column-wise without creating new
matrices.  I can also change the structure of my model all over the place
very easily without changing the data.  This is (was) a controversial aspect
of R's modeling structure and it makes it really hard to find logistic
regression in the index of a book because the libraries are oriented around
the generalized linear modeling view of the world (which isn't so bad after
you get going with it).

d) lots of diagnostic info gets embedded into the resulting model.

On Thu, Feb 26, 2009 at 12:14 PM, Luc Maisonobe <Lu...@free.fr>wrote:

> This way, the residual would not be seen and only the
> pure observation vectors should be provided by user.
>

-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Luc Maisonobe <Lu...@free.fr>.

Ted Dunning a écrit :
> Hmmm...
> 
> I may have misunderstood our conversation (and therefore probably sounded a
> bit incoherent), but I was talking about weights on observations which may
> or may not be what you are talking about here.

This is exactly what I am talking about.

> 
> Also, if you *are* talking about the same thing and the vector is a vector
> of residuals such that there is one element per observation, then there is a
> different impedance mis-match in that I would typically expect the user to
> never see this vector of residuals, but rather to pass in a weight vector
> beside the observations.

I have tried to generalize the concept behind the existing estimation
package WeightedMeasurements. This class was designed to represent one
observation. The three parts associated with an observation in an
optimization context are weight, observed value, expected theoretical
value. The residual is observed - expected.

In the new design, I have separated the weight from the residual but
left the residual as one entity. This was expected to be provided by a
user defined MultiObjectiveFunction. If I understand you correctly, the
MultiObjectiveFunction should rather compute only the theoretical part
and the converter should gather a weights vector, an observations
vector, the objective function and perform both the subtraction and the
multiplication ? This way, the residual would not be seen and only the
pure observation vectors should be provided by user.

Is this right ?

Luc

> 
> On Thu, Feb 26, 2009 at 11:34 AM, Luc Maisonobe <Lu...@free.fr>wrote:
> 
>> The first tool is LestSquaresConverter which can convert a
>> MultiObjectiveFunction representing residuals into a simpler
>> ObjectiveFunction. I have taken into account both Gilles and Ted
>> remarks, the conversion can use no weights at all, vector weights or a
>> complete matrix representing both weights and correlations.
>>
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] redesigning the optimization/estimation packages (phase 2)

Posted by Ted Dunning <te...@gmail.com>.

Hmmm...

I may have misunderstood our conversation (and therefore probably sounded a
bit incoherent), but I was talking about weights on observations which may
or may not be what you are talking about here.

Also, if you *are* talking about the same thing and the vector is a vector
of residuals such that there is one element per observation, then there is a
different impedance mis-match in that I would typically expect the user to
never see this vector of residuals, but rather to pass in a weight vector
beside the observations.

On Thu, Feb 26, 2009 at 11:34 AM, Luc Maisonobe <Lu...@free.fr>wrote:

> The first tool is LestSquaresConverter which can convert a
> MultiObjectiveFunction representing residuals into a simpler
> ObjectiveFunction. I have taken into account both Gilles and Ted
> remarks, the conversion can use no weights at all, vector weights or a
> complete matrix representing both weights and correlations.
>

-- 
Ted Dunning, CTO
DeepDyve