You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Konstantin Berlin <kb...@gmail.com> on 2012/12/01 01:42:11 UTC

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Hi,

Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)

This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.

So lets start with the basics. A Newton method must compute a descent step by solving a linear equation

H*p = -g, (1)

where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as

H \approx J^T*J+\lambda I.

Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.

What you are proposing as good OO style is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
is going crazy, and you are wasting an incredible amount of memory. Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.

So, why exactly are you insisting on taking this performance penalty? You claim that the optimizer can work better if it gets a DerivativeStructure, why? Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.

You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.

Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization. If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed. If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging. 

This brings me to my second issue. There are important problems where computation of the derivatives combined with the actual function evaluation is *significantly* faster than computing each alone. I am not clear why I am hitting such resistance with this. For example, you are doing some sort of a simulation, which can be trivially adjusted in the end to give a derivative or a function value. A very real example of this is a Fast Multipole Method, which takes time to compute a series expansion of the function, but the resulting series expansion can be analytically differentiated.

What I suggested as function interfaces was just an initial quick suggestion that I would be happy for anyone to change in a way that everyone feels positive about. I just don't want my second issue to be ignored like a non-issue.

Thanks for your time,
Konstantin

On Nov 30, 2012, at 5:15 PM, Gilles Sadowski <gi...@harfang.homelinux.org> wrote:

>>> 
>>> I don't know if people are confused about auto-differentation, I
>>> think most people working in numerical analysis are very well aware
>>> of what it does. The issue here is that it is a completely separate
>>> subject from optimizations. In a proper OO design you would not mix
>>> the two together. Computation of the derivatives is a separate
>>> component from optimization. Optimization only assumes that you can
>>> compute one, it shouldn't care or force you two compute one in a
>>> specific way. That's bad design. Everyone component should only have
>>> necessary and sufficient requirements for use. Automatic
>>> differentiation is not necessary for optimization so it should not
>>> exist in the interface.
>> 
>> You are right.
>> 
>>> 
>>> I would like to add, that if my function is simple enough, I am very
>>> capable of running autdiff package a priori myself. The real need for
>>> finite-difference is when you cannot analytically compute the
>>> derivative. Though I think having a wrapper function around autdiff
>>> package is a nice tool to help developers quickly write code, if they
>>> prefer to have quicker development time but slightly slower code.
>> 
>>> 
>>> If someone can explain to me why auto-diff package should be directly
>>> forced or even be provided as alternative into optimization package
>>> when a simple wrapper function can do this task while maintaining
>>> much cleaner OO design, please tell me.
>> 
>> You just found the reason and explained it by yourself: it was bad
>> design and I am the one to blame for this. I was (and still am) stupid.
> 
> I'm not convinced that it was bad design. Maybe I'm completely off now:
> I thought that the purpose was internal simplification. I don't think
> that it can be inherently wrong to store any sort of derivatives (be it
> gradient or Jacobian) into a structure aimed at storing... partial
> derivatives (i.e. "DerivativeStructure").
> 
> Please let me know whether the problem is more in-depth than what I
> described previously (allowing the current way to provide the gradient
> and Jacobian).
> 
> I would nevertheless have suggested to delete the method
>    MultivariateFunction partialDerivative(int k);
> from "DifferentiableMultivariateFunction".
> 
>> 
>> So I suggest we disconnect differentiation from optimization, but in a
>> way that would let users decide how they provide the differentials. This
>> means I would not like to reintroduce the former interfaces.
>> 
>> What about having the optimize() methods taking two arguments function,
>> a MultivariateFunction for the value and a MultivariateVectorFunction
>> for the gradient? It would be up to the user to select how they compute
>> each function. They could use direct coding, they could use a finite
>> differences wrapper around the first function to implement the second,
> 
> 
> 
>> they could use our differentiation framework and put one wrapper to
>> extract the value and another one to extract the gradient.
> 
> That would also be a nice example of usage of the differentiation
> framework.
> 
>> What do you think?
> 
> Yes. Now I got it (I think). IIUC, we could even introduce the new interface
> before 3.1, and deprecated old the older ones (as you wanted to do anyways,
> IIRC).
> 
> I'll give it a try in the next days. OK?
> 
> 
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Sat, Dec 01, 2012 at 09:59:37AM -0800, Ted Dunning wrote:
> Correctness isn't that hard to get.  You just need to add a bitmap for
> exceptional values in all matrices.  This bitmap can be accessed by sparse
> operations so that the iteration is across the union of non-zero elements
> in the sparse vector/matrix and exception elements in the operand.
> 
> That fact is, however, that preserving NaN results in these corner cases is
> not normally a huge priority for users.  Deleting all support for sparse
> vectors, on the other hand, is a huge impact on utility of commons math.
>  To my mind deleting hugely useful functionality in the face of a small
> issue is upside down, especially when there is actually a pretty simple fix
> available.

Huge impact? It didn't seem so, since _nobody_ answered a recent poll about
deleting the sparse arrays feature.


Regards,
Gilles

P.S. There are other Java libraries which, last time I looked, seem to focus
     much more on the storage flexibility viewpoint than does CM (e.g.
     OjALgo).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Ted Dunning <te...@gmail.com>.

Correctness isn't that hard to get.  You just need to add a bitmap for
exceptional values in all matrices.  This bitmap can be accessed by sparse
operations so that the iteration is across the union of non-zero elements
in the sparse vector/matrix and exception elements in the operand.

That fact is, however, that preserving NaN results in these corner cases is
not normally a huge priority for users.  Deleting all support for sparse
vectors, on the other hand, is a huge impact on utility of commons math.
 To my mind deleting hugely useful functionality in the face of a small
issue is upside down, especially when there is actually a pretty simple fix
available.

On Sat, Dec 1, 2012 at 8:02 AM, Sébastien Brisard <sebastien.brisard@m4x.org
> wrote:

> A few months ago, we started a thread on this issue (on the users ML). It
> got no answers! I am personally not happy with removing the support for
> sparse vectors/matrices, but the truth is we didn't see a way to achieve
> the same degree of correctness as --say-- array based matrices and vectors.
> As far as I am concerned, though, this is still an open question, and if
> you have ideas about these issues, we would be glad to here them!
>

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Sébastien Brisard <se...@m4x.org>.

Hi,


2012/12/1 Konstantin Berlin <kb...@gmail.com>

> Hi,
>
> Now that I have some time, let me try to make my case clearly. First I
> want to say that this is not some attack on the automatic-differentation
> package. I love automatic-differentation and symbolic packages. I
> personally cannot compute a derivative without a computer for the life of
> me. And I am also glad we are finally starting to agree on something :)
>
> This discussion is about figuring out how an incorrect framework and
> storage affects the performance of an optimization. That is why I am so
> worried.
>
> So lets start with the basics. A Newton method must compute a descent step
> by solving a linear equation
>
> H*p = -g, (1)
>
> where H is the Hessian, g is the gradient, and p is the desired descent
> step. What I am about to say also holds for non-linear least squares
> method, where Hessian is approximated using the Jacobian as
>
> H \approx J^T*J+\lambda I.
>
> Now, if you are not solving a simple optimization problem that you keep
> giving examples for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H
> using your linear algebra framework, otherwise eq. (1) computation is going
> to take a while.
> This is actually a very active area of research, and that is why having
> sparse linear algebra (aren't you removing this? ;) ) and iterative solvers
> is important to a lot of people.
>

A few months ago, we started a thread on this issue (on the users ML). It
got no answers! I am personally not happy with removing the support for
sparse vectors/matrices, but the truth is we didn't see a way to achieve
the same degree of correctness as --say-- array based matrices and vectors.
As far as I am concerned, though, this is still an open question, and if
you have ideas about these issues, we would be glad to here them!


>
> What you are proposing as good OO style is that if someone has a function
> that they want to optimize, they need to take what is probably already a
> RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> within the next 10 lines in the optimizer, are going to be converted back
> to a RealMatrix? Not only have you just fragmented your heap space, but
> your garbage collector
> is going crazy, and you are wasting an incredible amount of memory. Now
> imagine if your Jacobian is actually very simple to compute but large, then
> this whole conversion back and forth actually takes much longer than the
> function evaluation.
>
> So, why exactly are you insisting on taking this performance penalty? You
> claim that the optimizer can work better if it gets a DerivativeStructure,
> why? Where is the fact that you are holding a DerivativeStructure come into
> effect for a Newton method? Can you provide any literature in that regard?
> The classical optimization bottleneck, if not the actual function
> evaluation, is eq. (1). The optimization methodology is build on top of
> years of research in computational linear algebra concepts, and does not
> care how the matrices are actually computed. Efficient memory usage and
> speed is critical here because in Newton optimizations eq. (1) is evaluated
> thousands of times. The goal of the Jacobian is not to store derivatives it
> is to store a matrix of numbers that allows you to quadratically
> approximate your target function.
>
> You guys are focusing on people using finite differences and trying to
> optimize a newton method to use finite differences. This is not what the
> main purpose of a newton method is. If you want a method that dynamically
> adjusts finite differences step size, you should switch to BOBYQA, which
> was designed for this case.
>
> Derivatives can be computed by a computer using a symbolic toolbox a
> priori (something that I was referring to when I accidentally said
> auto-differenation). They can also be efficiently simplified by that
> toolbox before being pasted back into you program. Auto-diff could also be
> an important tool for people if their problem is simple or they are not
> concerned with optimal efficiency . This can easily be handled by a wrapper
> and has nothing to do with Newton optimization. If you want to talk about
> proper OO design and internal simplification then you should focus on being
> able to pass a linear solver into your optimization method. This way you
> allow people to use iterative and sparse solvers when needed. If you are
> concerned about people getting derivatives wrong, you can do what MATLAB
> does, which is to add an option to check the derivatives using finite
> differences when debugging.
>
> This brings me to my second issue. There are important problems where
> computation of the derivatives combined with the actual function evaluation
> is *significantly* faster than computing each alone. I am not clear why I
> am hitting such resistance with this. For example, you are doing some sort
> of a simulation, which can be trivially adjusted in the end to give a
> derivative or a function value. A very real example of this is a Fast
> Multipole Method, which takes time to compute a series expansion of the
> function, but the resulting series expansion can be analytically
> differentiated.
>
> What I suggested as function interfaces was just an initial quick
> suggestion that I would be happy for anyone to change in a way that
> everyone feels positive about. I just don't want my second issue to be
> ignored like a non-issue.
>
> This is certainly not a non-issue.

Thanks for *your* time,
Sébastien


> Thanks for your time,
> Konstantin
>
> On Nov 30, 2012, at 5:15 PM, Gilles Sadowski <gi...@harfang.homelinux.org>
> wrote:
>
> >>>
> >>> I don't know if people are confused about auto-differentation, I
> >>> think most people working in numerical analysis are very well aware
> >>> of what it does. The issue here is that it is a completely separate
> >>> subject from optimizations. In a proper OO design you would not mix
> >>> the two together. Computation of the derivatives is a separate
> >>> component from optimization. Optimization only assumes that you can
> >>> compute one, it shouldn't care or force you two compute one in a
> >>> specific way. That's bad design. Everyone component should only have
> >>> necessary and sufficient requirements for use. Automatic
> >>> differentiation is not necessary for optimization so it should not
> >>> exist in the interface.
> >>
> >> You are right.
> >>
> >>>
> >>> I would like to add, that if my function is simple enough, I am very
> >>> capable of running autdiff package a priori myself. The real need for
> >>> finite-difference is when you cannot analytically compute the
> >>> derivative. Though I think having a wrapper function around autdiff
> >>> package is a nice tool to help developers quickly write code, if they
> >>> prefer to have quicker development time but slightly slower code.
> >>
> >>>
> >>> If someone can explain to me why auto-diff package should be directly
> >>> forced or even be provided as alternative into optimization package
> >>> when a simple wrapper function can do this task while maintaining
> >>> much cleaner OO design, please tell me.
> >>
> >> You just found the reason and explained it by yourself: it was bad
> >> design and I am the one to blame for this. I was (and still am) stupid.
> >
> > I'm not convinced that it was bad design. Maybe I'm completely off now:
> > I thought that the purpose was internal simplification. I don't think
> > that it can be inherently wrong to store any sort of derivatives (be it
> > gradient or Jacobian) into a structure aimed at storing... partial
> > derivatives (i.e. "DerivativeStructure").
> >
> > Please let me know whether the problem is more in-depth than what I
> > described previously (allowing the current way to provide the gradient
> > and Jacobian).
> >
> > I would nevertheless have suggested to delete the method
> >    MultivariateFunction partialDerivative(int k);
> > from "DifferentiableMultivariateFunction".
> >
> >>
> >> So I suggest we disconnect differentiation from optimization, but in a
> >> way that would let users decide how they provide the differentials. This
> >> means I would not like to reintroduce the former interfaces.
> >>
> >> What about having the optimize() methods taking two arguments function,
> >> a MultivariateFunction for the value and a MultivariateVectorFunction
> >> for the gradient? It would be up to the user to select how they compute
> >> each function. They could use direct coding, they could use a finite
> >> differences wrapper around the first function to implement the second,
> >
> >
> >
> >> they could use our differentiation framework and put one wrapper to
> >> extract the value and another one to extract the gradient.
> >
> > That would also be a nice example of usage of the differentiation
> > framework.
> >
> >> What do you think?
> >
> > Yes. Now I got it (I think). IIUC, we could even introduce the new
> interface
> > before 3.1, and deprecated old the older ones (as you wanted to do
> anyways,
> > IIRC).
> >
> > I'll give it a try in the next days. OK?
> >
> >
> > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Konstantin Berlin <kb...@gmail.com>.

I forgot to say that there are commonly used benchmarks for optimization algorithm developers. They are commonly used to compare different algorithms in publications. I am personally not familiar with them, but it would be easy to google them.

On Dec 1, 2012, at 1:31 PM, Gilles Sadowski <gi...@harfang.homelinux.org> wrote:

> Hello.
> 
>> 
>> Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)
> 
> I don't think that we disagreed about the fact that there was a problem with
> the interface to the user application. [_I_ started this thread hinting that
> there was a problem (at least for me as a user, based on my use-case).]
> 
> [Then there were several misunderstandings about what was the problem and how
> to solve it...]
> 
>> 
>> This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.
> 
> I can understand that this is an important point for your use-case.
> There are now two vantage points (at least) on two aspects to consider: You
> seem to have experience where storage matter (while I don't); you worry
> about "small" objects allocation (while I don't, anymore).
> 
> I'm not saying that your worries do not count; quite the contrary. CM is not
> my "toolbox"; it's a library based on the knowledge of its developers
> (obviously).
> If you bring actual use-cases (i.e. unit tests or real application code
> that can be benchmarked) that will show worrying behaviour, it will
> certainly trigger swift corrective action. [This has happened recently.]
> 
> In this area (performance), the problem is that intuition (or educated
> guess, however you want to name it) is not a substitute for actual runs,
> sometimes by a large amount. [When I started to use CM, I raised the issue
> that a 3D vector was immutable (so that I had to create a new one whenever
> its coordinates were changing). Surely this was a performance hit! Actually
> it wasn't.]
> 
> Again, that does not mean that you are wrong in this case. It's just that we
> cannot jump and change the code every time someone comes up with what
> amounts to "conventional wisdom". If the person comes with a patch that
> readily proves the point (at least marginally through microbenchmarking),
> then we all benefit.
> 
>> 
>> So lets start with the basics. A Newton method must compute a descent step by solving a linear equation
>> 
>> H*p = -g, (1)
>> 
>> where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as
>> 
>> H \approx J^T*J+\lambda I.
>> 
>> Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
>> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
>> This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
> 
> Yes we are removing it because it is buggy and _nobody_ stepped up to say
> that it was important for CM users, and to help solve the problems in a way
> consistent with real-life usage of such a feature.
> As Sébastien said, you are warmly welcome to help us find a way to keep the
> feature.
> 
>> 
>> What you are proposing as good OO style 
> 
> This discussion has really nothing to do with "OO style", merely code reuse;
> and it was my big misunderstanding (partly because of my lack of knowledge,
> partly because of the lack of documentation on the targetted usages of
> "DerivativeStructure", which IIUC now, are outside CM) that I believed
> that the new "MultivariateDifferentiableFunction" was the way to go.
> 
> [Also, Luc had been moving very fast on this. And I couldn't keep up
> although I had wondered earlier why this had to impact usage in the
> "optimization" package.]
> 
>> is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> 
> IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for
> this example, using "DerivativeStructure" would lead to about a 4% increase
> in storage memory.
> 
>> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
>> is going crazy, and you are wasting an incredible amount of memory.
> 
> This is the kind of assertions that really needs support from actual code.
> [Again, I don't claim to know better; I think that it would really be useful
> to accumulate a list of "do and don't" for Java and CM, in the form of
> reproducible user experience.]
> 
>> Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.
> 
> We are willing to take this into account, but since I cannot see the
> behaviour in my use-case (where the evaluation time overwhelms, by several
> orders of magnitude, all such considerations), I do not have any incentive
> to change what seems to be "efficient enough" (for the actually known
> cases).
> Again, your experience with other use-cases would be very valuable to
> analyze the efficiency of the CM implementations and improve them, if
> necessary.
> 
>> 
>> So, why exactly are you insisting on taking this performance penalty?
> 
> It's the other way around. Until the performance penalty is proven, we've
> decided that it's up to the developer willing to do the work to decide.
> Again, if there is patch, it makes the matter much easier to decide on.
> 
> I admit that we decided to change the internal working of the optimizers, so
> _we_ should have proved that it does not impact usability. Hence, again, the
> usefulness of having a test suite of "real" applications which could be
> benchmarked regularly in order to have performance regressions induced by
> otherwise welcome changes.
> [I raised that issue several times on the ML.]
> 
>> You claim that the optimizer can work better if it gets a DerivativeStructure, why?
> 
> This was (supposed to be) an improvement of the _internal_ design. [Several
> weeks ago, Luc posted the problems he was encountering.]
> 
>> Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.
> 
> OK. Then Luc's proposal seems appropriate.
> There would be new interfaces defining the "optimize" method appropriately.
> For algorithms that need the gradient, it must be provided in an additional
> argument, of type "MultivariateVectorFunction"; for those that need, the
> Jacobian, the argument would be of type "MultivariateMatrixFunction".
> Do you agree?
> 
>> 
>> You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.
> 
> Another misunderstanding probably. [The "stepSize" discussion was sort of
> a digression.]
> 
>> 
>> Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization.
> 
> Maybe we should also change the "NewtonSolver" (in package
> "o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer
> with derivatives. Actually those two packages were improved in parallel so
> that the interface would be very similar from a usage point of view.]
> 
>> If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed.
> 
> This is a new feature; contributions are welcome. [CM is primarily
> developed by people who use (part of) it; if they don't need that feature,
> or don't know how to implement it, or do not have the time to implement it,
> it won't be implemented.]
> 
>> If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging. 
> 
> IMHO, the top-priority feature to help in debugging is to introduce logging
> statments!
> 
>> 
>> This brings me to my second issue. There are important problems where computation
>> of the derivatives combined with the actual function evaluation is *significantly*
>> faster than computing each alone. I am not clear why I am hitting such resistance
>> with this. [...]
> 
> [If you think that you were met with strong resistance, I'd suggest that you
> look in the archives for the discussions about exceptions in CM...]
> 
> Unless I'm mistaken, CM does not prevent you to write a class that takes
> advantage of combining the computations.
> 
>> What I suggested as function interfaces was just an initial quick suggestion
>> that I would be happy for anyone to change in a way that everyone feels positive
>> about. I just don't want my second issue to be ignored like a non-issue.
> 
> I'm getting confused. Could you please restate what where those first and
> second issue?
> 
> Thanks,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hello.

> > 
> >> 
> >> Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)
> > 
> > I don't think that we disagreed about the fact that there was a problem with
> > the interface to the user application. [_I_ started this thread hinting that
> > there was a problem (at least for me as a user, based on my use-case).]
> > 
> > [Then there were several misunderstandings about what was the problem and how
> > to solve it...]
> > 
> >> 
> >> This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.
> > 
> > I can understand that this is an important point for your use-case.
> > There are now two vantage points (at least) on two aspects to consider: You
> > seem to have experience where storage matter (while I don't); you worry
> > about "small" objects allocation (while I don't, anymore).
> > 
> > I'm not saying that your worries do not count; quite the contrary. CM is not
> > my "toolbox"; it's a library based on the knowledge of its developers
> > (obviously).
> > If you bring actual use-cases (i.e. unit tests or real application code
> > that can be benchmarked) that will show worrying behaviour, it will
> > certainly trigger swift corrective action. [This has happened recently.]
> > 
> > In this area (performance), the problem is that intuition (or educated
> > guess, however you want to name it) is not a substitute for actual runs,
> > sometimes by a large amount. [When I started to use CM, I raised the issue
> > that a 3D vector was immutable (so that I had to create a new one whenever
> > its coordinates were changing). Surely this was a performance hit! Actually
> > it wasn't.]
> 
> This is not the same case. You take a function that a user normally expresses as a two dimensional array or a vector, you force them to allocate 10K+ new objects that you then have to unwrap back into the structure the user would have happily supplied you in the first place. The second issue you missed is one of scale. The difference between modifying a 3 variable array and creating a copy is not large. Try doing this with a 10k+ vector, where you don't actually need to modify any of the entries but are just doing copies for the hell of it. This is a known critical component of an optimization and should be optimized for performance itself.

It's always not the same case (really). The crux of the example was about
(my) misconception about what takes time in a JVM.
In CM, there are tons of unneeded (from your point-of-view) copying because
in most cases, it was deemed worth in order to ensure the integrity of the
subsequent processing.
And we also noticed that in the "usual" (the developers' use-cases), it did
not matter much. [And where it does, then we try to take it into account.]

> 
> > 
> > Again, that does not mean that you are wrong in this case. It's just that we
> > cannot jump and change the code every time someone comes up with what
> > amounts to "conventional wisdom". If the person comes with a patch that
> > readily proves the point (at least marginally through microbenchmarking),
> > then we all benefit.
> > 
> 
> This is not "conventional wisdom", this is 60 years of research, so you better provide a good reason for you increased level of abstraction. I asked you why you would indirectly wrap a Hessian or a Jacobian in a much heaver class that provides no useful features for a newton optimizer. I don't think you gave me an answer. I do not believe that good OO design is to have as many abstract layers as you can think of. Good OO design is just like any good engineering design, its about having things clean, simple, and not be dependent on components that are not required. In addition, if you are working on a code that is called thousands of times, you should think really careful about memory performance.

I only talked about code reuse (internally). This should not have impacted
users (i.e. require them to learn about "DerivativeStructure"). Luc already
said that _this_ (what I say in the provious sentence) is was not necessary.
We are not trying to get one step further. Please forget about
"DerivativeStructure".

> 
> >> 
> >> So lets start with the basics. A Newton method must compute a descent step by solving a linear equation
> >> 
> >> H*p = -g, (1)
> >> 
> >> where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as
> >> 
> >> H \approx J^T*J+\lambda I.
> >> 
> >> Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
> >> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
> >> This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
> > 
> > Yes we are removing it because it is buggy and _nobody_ stepped up to say
> > that it was important for CM users, and to help solve the problems in a way
> > consistent with real-life usage of such a feature.
> > As Sébastien said, you are warmly welcome to help us find a way to keep the
> > feature.
> 
> I personally do not use sparse linear algebra. I was just pointing out how important computation of eq. 1 is in general. I wish I had the time to help :(
> 
> > 
> >> 
> >> What you are proposing as good OO style 
> > 
> > This discussion has really nothing to do with "OO style", merely code reuse;
> > and it was my big misunderstanding (partly because of my lack of knowledge,
> > partly because of the lack of documentation on the targetted usages of
> > "DerivativeStructure", which IIUC now, are outside CM) that I believed
> > that the new "MultivariateDifferentiableFunction" was the way to go.
> > 
> > [Also, Luc had been moving very fast on this. And I couldn't keep up
> > although I had wondered earlier why this had to impact usage in the
> > "optimization" package.]
> > 
> >> is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> > 
> > IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for
> > this example, using "DerivativeStructure" would lead to about a 4% increase
> > in storage memory.
> 
> Ok 1000. I guess it's really hard to understand this DerivativeStructure without further documentation. You can change my problem to least-squares problem with a very large number of observables. Same problem will pop-up again.

Maybe, maybe not.
I think that it's better to drop the issue of performance. Without figures,
it's useless. [See, for example, issue MATH-740 where we have figures, yet
nobody stepped up to improve the situation.]

> 
> > 
> >> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
> >> is going crazy, and you are wasting an incredible amount of memory.
> > 
> > This is the kind of assertions that really needs support from actual code.
> > [Again, I don't claim to know better; I think that it would really be useful
> > to accumulate a list of "do and don't" for Java and CM, in the form of
> > reproducible user experience.]
> > 
> >> Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.
> > 
> > We are willing to take this into account, but since I cannot see the
> > behaviour in my use-case (where the evaluation time overwhelms, by several
> > orders of magnitude, all such considerations), I do not have any incentive
> > to change what seems to be "efficient enough" (for the actually known
> > cases).
> > Again, your experience with other use-cases would be very valuable to
> > analyze the efficiency of the CM implementations and improve them, if
> > necessary.
> > 
> >> 
> >> So, why exactly are you insisting on taking this performance penalty?
> > 
> > It's the other way around. Until the performance penalty is proven, we've
> > decided that it's up to the developer willing to do the work to decide.
> > Again, if there is patch, it makes the matter much easier to decide on.
> > 
> > I admit that we decided to change the internal working of the optimizers, so
> > _we_ should have proved that it does not impact usability. Hence, again, the
> > usefulness of having a test suite of "real" applications which could be
> > benchmarked regularly in order to have performance regressions induced by
> > otherwise welcome changes.
> > [I raised that issue several times on the ML.]
> 
> It's not about usability. When you tests your components you should not
> test on small problems, because any bad software will work on that.
> Create a case where you have 10K+ observations and 10K+ variable and see
> how your change effects the solution. I wish I would have time to do it
> myself, but I don't.

Ditto.

> As a user I can tell you that scalability is an important issue.

I agree. So what?

> 
> > 
> >> You claim that the optimizer can work better if it gets a DerivativeStructure, why?
> > 
> > This was (supposed to be) an improvement of the _internal_ design. [Several
> > weeks ago, Luc posted the problems he was encountering.]
> > 
> >> Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.
> > 
> > OK. Then Luc's proposal seems appropriate.
> > There would be new interfaces defining the "optimize" method appropriately.
> > For algorithms that need the gradient, it must be provided in an additional
> > argument, of type "MultivariateVectorFunction"; for those that need, the
> > Jacobian, the argument would be of type "MultivariateMatrixFunction".
> > Do you agree?
> > 
> 
> I keep saying that there are cases when evaluating value gradient Hessian is faster together than as a separate function. So no, I do not agree. I do agree it is better than having DerivativeStructure, but I think it is worse than what you had in 3.0. The proposal is pretty much like before in 3.0, but now I need to create two classes every time I want to optimize a function. Why are you doing this? I don't understand why this change needs to happen. 
> 
> I don't see a problem at all. All that has to happen is that the function in gradient based methods overwrites the optimization function, such that it now takes a subclass of the function that is used in non-gradient method. That sub-class would require a gradient function to be implemented.

Sorry, I'm confused.
Could you please spell out _in code_ what you want implemented, or just tell
what classes are good as they are and which changes are bad, in the current
developement version?

> 
> >> 
> >> You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.
> > 
> > Another misunderstanding probably. [The "stepSize" discussion was sort of
> > a digression.]
> > 
> >> 
> >> Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization.
> > 
> > Maybe we should also change the "NewtonSolver" (in package
> > "o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer
> > with derivatives. Actually those two packages were improved in parallel so
> > that the interface would be very similar from a usage point of view.]
> > 
> 
> Not sure what you mean.
> 
> >> If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed.
> > 
> > This is a new feature; contributions are welcome. [CM is primarily
> > developed by people who use (part of) it; if they don't need that feature,
> > or don't know how to implement it, or do not have the time to implement it,
> > it won't be implemented.]
> > 
> >> If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging. 
> > 
> > IMHO, the top-priority feature to help in debugging is to introduce logging
> > statements!
> 
> I would go further than that. I think the user should be able to get back all the iterations steps back, and the associated rmsd values, etc if so desired. But that is sort of a wish.

[I don't agree, but this is another discussion.
With logging, you'd be able to get this information without cluttering CM
with data structure for saving transient states.]

> > 
> >> 
> >> This brings me to my second issue. There are important problems where computation
> >> of the derivatives combined with the actual function evaluation is *significantly*
> >> faster than computing each alone. I am not clear why I am hitting such resistance
> >> with this. [...]
> > 
> > [If you think that you were met with strong resistance, I'd suggest that you
> > look in the archives for the discussions about exceptions in CM...]
> > 
> > Unless I'm mistaken, CM does not prevent you to write a class that takes
> > advantage of combining the computations.
> 
> How would I write that class? You optimizer first requests function value , then separately the Jacobian. It is not done at the same time. I could try to cache intermediately results, see if you are requesting a Jacobian for the same values you called my value function for, but that is very dirty.

In text, that looks dirty. ;-)
Please provide the code snippets which I request above, and I surely hope
that we'll meet at some point. Let me remind you that I probably need the
same thing as you! :-)

> 
> > 
> >> What I suggested as function interfaces was just an initial quick suggestion
> >> that I would be happy for anyone to change in a way that everyone feels positive
> >> about. I just don't want my second issue to be ignored like a non-issue.
> > 
> > I'm getting confused. Could you please restate what where those first and
> > second issue?
> 
> First issue is you are creating extra layers of conversion for the user that is
> bad for performance,

OK. Thanks. That is a misunderstanding. It's certainly not the goal of those
recent changes. And if there is a performance hit in the upcoming CM 3.1, I
can assure you that it will corrected in 3.2.

> and there is not a single specific reason given for why its a good idea.

...

> Second issue is that computing value gradient Hessian might be faster for
> some functions together rather than in two unconnected calls.

OK. Good point. Was it possible before (in version 3.0)? How did you do it?
(cf. code snippet request). Then we'll make sure it will work in 3.1 too
(albeit with small syntax changes if needed for our internal cleanup of the
package). How does that sound?


Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Sun, Dec 02, 2012 at 11:23:58PM +0100, Luc Maisonobe wrote:
> 
> 
> 
> Gilles Sadowski <gi...@harfang.homelinux.org> a écrit :
> 
> >Hello.
> > 
> >> > 
> >> > I would propose to simply revert my changes on the optimization
> >package 
> >> > and prepare for a reorganization for 4.0. I understand I focused
> >only on
> >> > the type of problems Gilles and myself routinely use, i .e. small
> >size problems 
> >> > where the cost of the evaluation is several orders of magnitude
> >larger than the
> >> > data copying. I forgot the dual case with  very large data sets. I
> >apologize for that.
> >> 
> >> In the context of an FLOSS project, I don't think that you should
> >apologize
> >> for that. We all do our best (and sometimes beyond) and the lack of
> >full
> >> problem coverage should certainly not rest on the usual suspects (eh,
> >> contributors, I mean. ;-)
> >> 
> >> > 
> >> > When 3.1 will be out, we will have to solve this so both  cases are
> >handled efficiently, 
> >> > and this would probably be implemented in 4.0.
> >> > 
> >> > Does this seems reasonable? 
> >> 
> >> At this point, no!
> >> 
> >> I really want to understand what is wrong, and improve the API on all
> >> accounts (the issues you had, the one I raised, and the feedback from
> >> Konstantin, plus advice from others if possible).
> >
> >If I understood correctly, the main API problem is that the objective
> >function and gradient or Jacobian are accessed separately. This forces
> >users
> >who have code to compute the value and Jacobian together to create an
> >adapter
> >that can return the two functions (objective and Jacobian) required by
> >the
> >API but would avoid the computation if the a call to the other was
> >already
> >made with the same parameter values.
> >
> >For example, instead of passing a
> >"DifferentiableMultivariateVectorFunction"
> >to the method "optimize" in "AbstractLeastSquareOptimizer", a type like
> >the
> >following would be required:
> >-----
> >interface VectorFunctionWithJacobian {
> >  ValueAndGradient[] computeValueAndJacobian(double[] parameters);
> >}
> >-----
> >with
> >-----
> >public class ValueAndGradient {
> >    /** Value of a function at a given point. */
> >    private final double value;
> >    /** Gradient of the function at the same point. */
> >    private final double[] gradient;
> >
> >    /**
> >     * Creates an instance for the given value and gradient.
> >* Note that no copy is performed: this instance will contain references
> >     * to the original data.
> >     *
> >     * @param value Value.
> >     * @param gradient Vector.
> >     */
> >    public ValueAndGradient(double value,
> >                            double[] gradient) {
> >        this.value = value;
> >        this.gradient = gradient;
> >    }
> >
> >    /**
> >     * @eturn a reference to the value.
> >     */
> >    public double getValue() {
> >    }
> >    /**
> >     * @eturn a reference to the gradient.
> >     */
> >    public double[] getGradient() {
> >        return gradient;
> >    }
> >}
> >-----
> >
> >Is that a better approach?
> >
> >IIUC, the "DerivativeStructure" can also be used if just as a storage
> >layout
> >equivalent to "ValueAndGradient". If so, then the new proposed API
> >(where the objective function is a
> >"MultivariateDifferentiableVectorFunction")
> >is indeed better than the old, except for the (perhaps major) caveat
> >that
> >from the "optimization" package, it indeed looks over-engineered and
> >quite
> >unfamiliar.
> 
> Most importantly, it forces data copies, which is a problem for large data sets.

I actually wanted to get agreement on starting to implement the proposal I
outlined above (possibly with other names if there are suggestions).

The remark about "DerivativeStructure" serving as storage was asking whether
it's correct that it _would be_ able to replace the "ValueAndGradient" data
structure. [Even if it is over-engineered for this task (and thus
unsatisfactory for users of the "optimization" package).]
My intention is indeed to not use it.
No copying is necessarily "forced", on the condition that users write their
code with "DerivativeStructure" objects. [Before any flaming, I agree that
it is not the way to go!]  However, even if they did so, copying would still
occur in the current implementation[1] and that is indeed not nice and I
now understand that you probably meant to correct this when you proposed to
revert your changes.[2]

Also, I'd like some input about the "optimization" package name change and
reorganization.
I proposed to create an new "optim" package (and deprecate "optimization").
Under "optim", the layout is subject to discussion (cf. alternate proposal
by Konstantin Berlin).
Changing the API as proposed above might naturally lead to some layout...
so let's decide about that first.

Thanks,
Gilles

[1] Cf. line 182 in "AbstractLeastSquaresOptimizer".
[2] +1 to do it. Let me know whether you want some help with this.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Luc Maisonobe <lu...@spaceroots.org>.



Gilles Sadowski <gi...@harfang.homelinux.org> a écrit :

>Hello.
> 
>> > 
>> > I would propose to simply revert my changes on the optimization
>package 
>> > and prepare for a reorganization for 4.0. I understand I focused
>only on
>> > the type of problems Gilles and myself routinely use, i .e. small
>size problems 
>> > where the cost of the evaluation is several orders of magnitude
>larger than the
>> > data copying. I forgot the dual case with  very large data sets. I
>apologize for that.
>> 
>> In the context of an FLOSS project, I don't think that you should
>apologize
>> for that. We all do our best (and sometimes beyond) and the lack of
>full
>> problem coverage should certainly not rest on the usual suspects (eh,
>> contributors, I mean. ;-)
>> 
>> > 
>> > When 3.1 will be out, we will have to solve this so both  cases are
>handled efficiently, 
>> > and this would probably be implemented in 4.0.
>> > 
>> > Does this seems reasonable? 
>> 
>> At this point, no!
>> 
>> I really want to understand what is wrong, and improve the API on all
>> accounts (the issues you had, the one I raised, and the feedback from
>> Konstantin, plus advice from others if possible).
>
>If I understood correctly, the main API problem is that the objective
>function and gradient or Jacobian are accessed separately. This forces
>users
>who have code to compute the value and Jacobian together to create an
>adapter
>that can return the two functions (objective and Jacobian) required by
>the
>API but would avoid the computation if the a call to the other was
>already
>made with the same parameter values.
>
>For example, instead of passing a
>"DifferentiableMultivariateVectorFunction"
>to the method "optimize" in "AbstractLeastSquareOptimizer", a type like
>the
>following would be required:
>-----
>interface VectorFunctionWithJacobian {
>  ValueAndGradient[] computeValueAndJacobian(double[] parameters);
>}
>-----
>with
>-----
>public class ValueAndGradient {
>    /** Value of a function at a given point. */
>    private final double value;
>    /** Gradient of the function at the same point. */
>    private final double[] gradient;
>
>    /**
>     * Creates an instance for the given value and gradient.
>* Note that no copy is performed: this instance will contain references
>     * to the original data.
>     *
>     * @param value Value.
>     * @param gradient Vector.
>     */
>    public ValueAndGradient(double value,
>                            double[] gradient) {
>        this.value = value;
>        this.gradient = gradient;
>    }
>
>    /**
>     * @eturn a reference to the value.
>     */
>    public double getValue() {
>    }
>    /**
>     * @eturn a reference to the gradient.
>     */
>    public double[] getGradient() {
>        return gradient;
>    }
>}
>-----
>
>Is that a better approach?
>
>IIUC, the "DerivativeStructure" can also be used if just as a storage
>layout
>equivalent to "ValueAndGradient". If so, then the new proposed API
>(where the objective function is a
>"MultivariateDifferentiableVectorFunction")
>is indeed better than the old, except for the (perhaps major) caveat
>that
>from the "optimization" package, it indeed looks over-engineered and
>quite
>unfamiliar.

Most importantly, it forces data copies, which is a problem for large data sets.

Best regards
Luc

>
>
>Regards,
>Gilles
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>For additional commands, e-mail: dev-help@commons.apache.org

-- 
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hello.
 
> > 
> > I would propose to simply revert my changes on the optimization package 
> > and prepare for a reorganization for 4.0. I understand I focused only on
> > the type of problems Gilles and myself routinely use, i .e. small size problems 
> > where the cost of the evaluation is several orders of magnitude larger than the
> > data copying. I forgot the dual case with  very large data sets. I apologize for that.
> 
> In the context of an FLOSS project, I don't think that you should apologize
> for that. We all do our best (and sometimes beyond) and the lack of full
> problem coverage should certainly not rest on the usual suspects (eh,
> contributors, I mean. ;-)
> 
> > 
> > When 3.1 will be out, we will have to solve this so both  cases are handled efficiently, 
> > and this would probably be implemented in 4.0.
> > 
> > Does this seems reasonable? 
> 
> At this point, no!
> 
> I really want to understand what is wrong, and improve the API on all
> accounts (the issues you had, the one I raised, and the feedback from
> Konstantin, plus advice from others if possible).

If I understood correctly, the main API problem is that the objective
function and gradient or Jacobian are accessed separately. This forces users
who have code to compute the value and Jacobian together to create an adapter
that can return the two functions (objective and Jacobian) required by the
API but would avoid the computation if the a call to the other was already
made with the same parameter values.

For example, instead of passing a "DifferentiableMultivariateVectorFunction"
to the method "optimize" in "AbstractLeastSquareOptimizer", a type like the
following would be required:
-----
interface VectorFunctionWithJacobian {
  ValueAndGradient[] computeValueAndJacobian(double[] parameters);
}
-----
with
-----
public class ValueAndGradient {
    /** Value of a function at a given point. */
    private final double value;
    /** Gradient of the function at the same point. */
    private final double[] gradient;

    /**
     * Creates an instance for the given value and gradient.
     * Note that no copy is performed: this instance will contain references
     * to the original data.
     *
     * @param value Value.
     * @param gradient Vector.
     */
    public ValueAndGradient(double value,
                            double[] gradient) {
        this.value = value;
        this.gradient = gradient;
    }

    /**
     * @eturn a reference to the value.
     */
    public double getValue() {
    }
    /**
     * @eturn a reference to the gradient.
     */
    public double[] getGradient() {
        return gradient;
    }
}
-----

Is that a better approach?

IIUC, the "DerivativeStructure" can also be used if just as a storage layout
equivalent to "ValueAndGradient". If so, then the new proposed API
(where the objective function is a "MultivariateDifferentiableVectorFunction")
is indeed better than the old, except for the (perhaps major) caveat that
from the "optimization" package, it indeed looks over-engineered and quite
unfamiliar.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Konstantin Berlin <kb...@gmail.com>.

Hi,

I think this is getting silly. What I am saying is not a matter of opinions but of textbook optimizations. This is not a matter of use cases, but of something that is already well established. I feel like this package is trying to reinvent the wheel, in a subject that is already well known, and very settled.

I am not going to present myself as some expert in optimizations, however I am familiar with different optimization methods, why they exists, how they work and relate to each other, and how constraints (be they equality and inequality, bounds, linear constraints, and nonlinear constrains) are added to optimization. Knowing what I know, I can't help but to feel it is important to have this perspective before trying to properly engineer an optimization package, even if you do not have most of these features implemented yet.

If you are curious about how to add constraints to optimization methods or optimizations in general, you can read some slides here
http://www.cs.umd.edu/users/oleary/a607/

I have issues with the suggestions that I am presented with, since IMHO they are missing the big picture. I feel like least-squares methods are presented here as though as they are an extension of optimization into multiple dimensions. That is incorrect, and even if you know that, you are misleading the user. Least-squares is a limited case of a general optimization problem. This expression will get you an F in a math class:
ValueAndGradient[] computeValueAndJacobian(double[] parameters);

The vector value and the matrix Jacobian should be separated, not combined together. They are not mathematically defined like this, nor are they used like this in optimizations, and would require extraction inside the optimization method. This expression prevents the use of sparse matrices, if that would be desired in the future.

> Chi2 values are only used (and subsequently made accessible to the caller)
> in the "AbstractLeastSquaresOptimizer" hierarchy.
> In the current design, it would not make any sense to just change the return
> value type and force every algorithm to fill in data they don't use.
> 
> Discussion is open but should be based on actual cases, or on patches that
> demonstrate how the change is applied to all the implementations which are
> supposed to be affected.

This is a perfect example of why inherence was created.

You guys are free to proceed how you want. I am trying to save you guys the trouble of having to redesign your optimization package in the near future.

I will be happy to comment on proposals, but I no longer have the time to argue about fundamentals of optimizations. I do not mean to sound like this, but I am really out of personal time. I hope you take a "users" view point seriously.

Thanks,
Konstantin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Tue, Dec 04, 2012 at 05:47:09AM -0500, Konstantin Berlin wrote:
> Hi,
> 
> I think this is getting silly. What I am saying is not a matter of
> opinions but of textbook optimizations.  This is not a matter of use
> cases, but of something that is already well established.  I feel like
> this package is trying to reinvent the wheel, in a subject that is already
> well known, and very settled.
> 
> I am not going to present myself as some expert in optimizations, however
> I am familiar with different optimization methods, why they exists, how
> they work and relate to each other, and how constraints (be they equality
> and inequality, bounds, linear constraints, and nonlinear constrains) are
> added to optimization.  Knowing what I know, I can't help but to feel it
> is important to have this perspective before trying to properly engineer
> an optimization package, even if you do not have most of these features
> implemented yet.

What you seem to fail to understand is what Commons Math is.
Currently it is a repository of math-oriented algorithms which users can
choose from in order to solve whatever problems which they are able to adapt
to the input required by those algorithms.

Illustrating with the "optimization" package: users are not directed to one
specific algorithm as _the_ solution to their problem; they are allowed to
choose any algorithm whose expected input matches what the user is able to
provide.
If I want to solve a non-linear least squares problem with a derivative-free
method, I can. [Even if I very much appreciate your advice about
derivative-based methods being provably faster than direct methods, and I
was indeed about to test Levenberg-Marquardt and see how it compares on our
function (a simulator's output).]

In that respect Commons Math thus very much differ from "JOptimizer" that
indeed took the problem-oriented approach, for the perfectly valid reason
that it aims at solving specific optimization problems, each in the best
known way.

I proposed the package layout under "optim" based on what input must be
provided to the algorithms under the given package also because it is
similar to what is done in other packages, i.e. a subdivision based on the
algorithms' (or data structures') similarities rather than on what end they
are supposed to be put to use.

More importantly, as I've indicated before, due to lack of human resources,
we are unable to develop whole sets of tools for the sole purpose of
inclusion into Commons Math (to my knowledge, and however interesting that
would be, nobody is paid to develop CM!).

> If you are curious about how to add constraints to optimization methods or
> optimizations in general, you can read some slides here
> http://www.cs.umd.edu/users/oleary/a607/

Thanks for the pointer.

> I have issues with the suggestions that I am presented with, since IMHO
> they are missing the big picture.  I feel like least-squares methods are
> presented here as though as they are an extension of optimization into
> multiple dimensions.  That is incorrect, and even if you know that, you
> are misleading the user.  Least-squares is a limited case of a general
> optimization problem.  This expression will get you an F in a math class:
> ValueAndGradient[] computeValueAndJacobian(double[] parameters);
> 
> The vector value and the matrix Jacobian should be separated, not combined
> together.  They are not mathematically defined like this, nor are they
> used like this in optimizations, and would require extraction inside the
> optimization method.

How the optimization method will extract the data is an implementation
detail: Whether it accesses it as
  value4 = DiffValueLeastSquares.values[4];
  j40 = DiffValueLeastSquares.J[4][0];
or
  vAndG = ValueAndGradient[4];
  value4 = vAndG.value;
  j40 = vAndG.gradient[0];
is completely transparent.
I admit nevertheless than if the matrix is used as an "object" and the
subsequent operations involves matrix operations, it is indeed better to
handle it as such from the start.

I thus withdraw my proposal, and will stick to what Luc proposed e.g. the
"optimize" method in "AbstractLeastSquares" will be:
-----
  PointVectorValuePair optimize(int maxEval,
                                MultivariateVectorFunction value,
                                MultivariateMatrixFunction jacobian,
                                OptimizationData... optData)
-----

> This expression prevents the use of sparse matrices,
> if that would be desired in the future.

So does your "DiffValueLeastSquares"...

That may be a good point but without a patch, we cannot hope to get much
further.
For example, to allow for sparse matrices we could have a new interface:
-----
  public NewMultivariateMatrixFunction {
    RealMatrix value(double[] point);
  }
-----
[Whereas the existing "MultivariateMatrixFunction" returns a "double[][]".]

That's possible to implement now. Is it necessary? No because no one has a
use-case that demonstrates the advantage. As I said, when that data exists,
it will be taken into account.

> 
> > Chi2 values are only used (and subsequently made accessible to the caller)
> > in the "AbstractLeastSquaresOptimizer" hierarchy.
> > In the current design, it would not make any sense to just change the return
> > value type and force every algorithm to fill in data they don't use.
> > 
> > Discussion is open but should be based on actual cases, or on patches that
> > demonstrate how the change is applied to all the implementations which are
> > supposed to be affected.
> 
> This is a perfect example of why inherence was created.

That sentence does not clarify what you mean.

> You guys are free to proceed how you want. I am trying to save you guys
> the trouble of having to redesign your optimization package in the near
> future.

I'm sorry but that's not what we need because the redesign(s) will happen
nonetheless. And this is not to be construed as a failure, it's the way
for a programming project to evolve when new knowledge and new features are
integrated into the picture.

> I will be happy to comment on proposals, but I no longer have the time to
> argue about fundamentals of optimizations.  I do not mean to sound like
> this, but I am really out of personal time.  I hope you take a "users"
> view point seriously.

We do take users seriously. Having a look at the bug tracking system is
ample proof.

I'll sound much less politically correct than Luc, but unfortunately, you
seem to not take the developers' point-of-view (and time) very seriously:
You do not have the time to provide what we need (raising issues on the bug
tracking system, answering direct questions on this ML, providing use-cases,
benchmarks, patches), fine, but then do not expect that we are (always)
going to do it for you!

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Konstantin Berlin <kb...@gmail.com>.

Hi,

I think this is getting silly. What I am saying is not a matter of opinions but of textbook optimizations. This is not a matter of use cases, but of something that is already well established. I feel like this package is trying to reinvent the wheel, in a subject that is already well known, and very settled.

I am not going to present myself as some expert in optimizations, however I am familiar with different optimization methods, why they exists, how they work and relate to each other, and how constraints (be they equality and inequality, bounds, linear constraints, and nonlinear constrains) are added to optimization. Knowing what I know, I can't help but to feel it is important to have this perspective before trying to properly engineer an optimization package, even if you do not have most of these features implemented yet.

If you are curious about how to add constraints to optimization methods or optimizations in general, you can read some slides here
http://www.cs.umd.edu/users/oleary/a607/

I have issues with the suggestions that I am presented with, since IMHO they are missing the big picture. I feel like least-squares methods are presented here as though as they are an extension of optimization into multiple dimensions. That is incorrect, and even if you know that, you are misleading the user. Least-squares is a limited case of a general optimization problem. This expression will get you an F in a math class:
ValueAndGradient[] computeValueAndJacobian(double[] parameters);

The vector value and the matrix Jacobian should be separated, not combined together. They are not mathematically defined like this, nor are they used like this in optimizations, and would require extraction inside the optimization method. This expression prevents the use of sparse matrices, if that would be desired in the future.

> Chi2 values are only used (and subsequently made accessible to the caller)
> in the "AbstractLeastSquaresOptimizer" hierarchy.
> In the current design, it would not make any sense to just change the return
> value type and force every algorithm to fill in data they don't use.
> 
> Discussion is open but should be based on actual cases, or on patches that
> demonstrate how the change is applied to all the implementations which are
> supposed to be affected.

This is a perfect example of why inherence was created.

You guys are free to proceed how you want. I am trying to save you guys the trouble of having to redesign your optimization package in the near future.

I will be happy to comment on proposals, but I no longer have the time to argue about fundamentals of optimizations. I do not mean to sound like this, but I am really out of personal time. I hope you take a "users" view point seriously.

Thanks,
Konstantin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Mon, Dec 03, 2012 at 11:22:21AM -0500, Konstantin Berlin wrote:
> Hi,
> 
> In my view there are two related but separate issues: (1) What should we
> do for 3.1 release, and (2) How do we want to design the optimization
> package for the future, such that it is easily extended to linear and
> non-linear constraints, sparse matrix operations, etc.
> 
> In regards to (1), I feel that designing a proper API for the future is a
> very important problem that would take a while, to not only implement and
> test, but to also come to an agreement.  This future API should not be
> limited by the backwards capability with 3.0, but should instead be
> careful designed from the ground up.  That is why I propose that we do not
> let these issues set back the much needed release of 3.1.  For example in
> 3.0 CMAES bound constraints are actually broken.  This should be patched
> ASAP.

???
Did you report this issue on the bug tracking system?

>  There are new features.  While my combined function issue is
> important, it is a new feature, and by no means critical.

It is not a new issue.
It is an improvement.
It is easy to implement.

>  In my view it
> is better to leave the Function classes as they are (or at least similar
> to what they are), so the user is comfortable with the interface.

Nothing will be removed as 3.1 must be backward-compatible.
It is however the right time to introduce changes as it gives users the
opportunity to switch to the new features.

Even if what I proposed is not the end of the story, it's fine to make the
change now.

> In regards to (2), Here are some things that we need to think about 1) How
> to properly design the function interfaces for the optimization packages,
> such that they address my combined value and gradient computation, but
> also have proper OO structure such that a function that is used in a
> "value+gradient" optimization can also be used for a no-gradient direct
> optimization?

We've just been thinking about that in this quite extensive thread.
Luc has changed the "...Function" interfaces (introducing
"DerivativeStructure"). You have convinced us to that this should not be
reused into the "optimization" package. That pretty much settles the issue
for now IMHO. Of course, that does not preclude further discussion and
further changes.

>  What wrapper classes should we provide for the user? 
> Should this functions be also comparable with analysis package (or some
> other package), or would optimization performance suffer?  How should
> these values be stored so they can be efficiently used by the optimizer
> without significant amount of memory copies.

These will be answered either if users provide concret use-cases for
testing. Up to now, most answers to similar questions came from regular
contributors having their user hat on. ;-)

> 2) Changing the return class
> for "optimization" function.  Like I mentioned before, I don't think the
> optimization class should keep state on the previous optimization.  All
> RMS, chi^2 values are part of the optimization result and not part of the
> optimizer.  An example of such use is where a user computed values from
> different initial points and then compares results.

Chi2 values are only used (and subsequently made accessible to the caller)
in the "AbstractLeastSquaresOptimizer" hierarchy.
In the current design, it would not make any sense to just change the return
value type and force every algorithm to fill in data they don't use.

Discussion is open but should be based on actual cases, or on patches that
demonstrate how the change is applied to all the implementations which are
supposed to be affected.

> 3) It might *not* be
> a good idea to have constraints be given as part of options into an
> "optimization" function.  First, different implementations have different
> limitations, so one method might take non-linear constraints, another
> doesn't.  This should be explicitly visible through the constructors of
> the class.  Second, linear constraints,and probably other constrains
> require pre-processing before use.  For example, if you have linear
> equality constraints in an optimization, you need to extract the null
> space using QR or SVD.  If you have a lot of constraints this process is
> expensive, and should be amortized for calls with same constrains but
> different initial points.  There are probably other examples of this.  We
> should think and discuss this, before jumping to a design.

Personally, I don't know what you are talking about: I can only assume that
you refer to the "linear" sub-package.
Only some of the other optimizers support constraints, and as of today, only
"simple bounds" (parameter value constrained into an interval).

We would be quite pleased to count an optimization expert among the
contributors, who would provide the feature you mention, but as I
indicated previously, Commons Math has a serious shortage of permanent
staff. :-D

> It is very unfortunate that there is a lack of man power to improve the
> algorithms.  I am very sad to say, but I have barely time to even reply to
> these emails.

In practice, much time is wasted in discussions, even on things that
regular contributors want to implement.
In practice, regular contributors introduce new code because they need it in
their applications, or because they identified some bug or weakness.
In practive, major design overhauls occur because new use-cases uncover a
previously a weakness (that did not get in the way of previous use-cases or
which users did not report).

> I look forward to hearing what other people think,

Yes that would be intersting. But maybe they don't have the time either, and
that should not prevent those who participated in this discussion to go
forward.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Konstantin Berlin <kb...@gmail.com>.

Hi,

In my view there are two related but separate issues: (1) What should we do for 3.1 release, and (2) How do we want to design the optimization package for the future, such that it is easily extended to linear and non-linear constraints, sparse matrix operations, etc.

In regards to (1),
I feel that designing a proper API for the future is a very important problem that would take a while, to not only implement and test, but to also come to an agreement. This future API should not be limited by the backwards capability with 3.0, but should instead be careful designed from the ground up. That is why I propose that we do not let these issues set back the much needed release of 3.1. For example in 3.0 CMAES bound constraints are actually broken. This should be patched ASAP. There are new features. While my combined function issue is important, it is a new feature, and by no means critical. In my view it is better to leave the Function classes as they are (or at least similar to what they are), so the user is comfortable with the interface.

In regards to (2),
Here are some things that we need to think about
1) How to properly design the function interfaces for the optimization packages, such that they address my combined value and gradient computation, but also have proper OO structure such that a function that is used in a "value+gradient" optimization can also be used for a no-gradient direct optimization? What wrapper classes should we provide for the user? Should this functions be also comparable with analysis package (or some other package), or would optimization performance suffer? How should these values be stored so they can be efficiently used by the optimizer without significant amount of memory copies.
2) Changing the return class for "optimization" function. Like I mentioned before, I don't think the optimization class should keep state on the previous optimization. All RMS, chi^2 values are part of the optimization result and not part of the optimizer. An example of such use is where a user computed values from different initial points and then compares results.
3) It might *not* be a good idea to have constraints be given as part of options into an "optimization" function. First, different implementations have different limitations, so one method might take non-linear constraints, another doesn't. This should be explicitly visible through the constructors of the class. Second, linear constraints,and probably other constrains require pre-processing before use. For example, if you have linear equality constraints in an optimization, you need to extract the null space using QR or SVD. If you have a lot of constraints this process is expensive, and should be amortized for calls with same constrains but different initial points. There are probably other examples of this. We should think and discuss this, before jumping to a design.

It is very unfortunate that there is a lack of man power to improve the algorithms. I am very sad to say, but I have barely time to even reply to these emails.

I look forward to hearing what other people think,
Konstantin

On Dec 3, 2012, at 8:16 AM, Gilles Sadowski <gi...@harfang.homelinux.org> wrote:

> On Sat, Dec 01, 2012 at 04:58:30PM -0500, Konstantin Berlin wrote:
>>> 
>>> I would propose to simply revert my changes on the optimization package 
>>> and prepare for a reorganization for 4.0. I understand I focused only on
>>> the type of problems Gilles and myself routinely use, i .e. small size problems 
>>> where the cost of the evaluation is several orders of magnitude larger than the
>>> data copying. I forgot the dual case with  very large data sets. I apologize for that. 
>>> 
>>> When 3.1 will be out, we will have to solve this so both  cases are handled efficiently, 
>>> and this would probably be implemented in 4.0.
>>> 
>>> Does this seems reasonable? 
>>> 
>>> Best regards 
>>> Luc
>>> 
>> 
>> Don't blame yourself, its hard to think of all possible uses.
> 
> Indeed. And it is also hard to get a design right without being guided by
> actual use-cases.
> Thus... [see below]
> 
>> I think reverting back to original interface is the best choice for 3.1.
> 
> I don't understand (cf. my other post); in the "orignal" interface (i.e.
> "DifferentiableMultivariateVectorFunction"), you have exactly the same
> problem as with what Luc proposed earlier: 2 separate functions, one for
> the value and one for the Jacobian.
> Please indicate whether my proposal is better from this point-of-view: One
> function call will return an array of "value and gradient" objects, and
> references to those can be accessed directly inside the optimizer's code
> to get the Jacobian data, without copying.
> 
>> In the future it is worth finding and adding "standard" industry benchmarks to the tests.
> 
> ... this issue came up several times, but was not able to gather the
> necessary "human-power": see
>  https://issues.apache.org/jira/browse/MATH-678
> and
>  https://issues.apache.org/jira/browse/MATH-763
> 
>> 
>> Speaking of optimizers... Would it not be better to move all the getRMS, getChiSquared into
>> the return class, along with all other information, like how many iterations it took, function
>> evaluations, etc? That is how I find it more natural to use, so I end up creating a wrapper.
>> These properties are not properties of the optimizer but are properties of the specific
>> optimization evaluation of the function with the specific initial value.
> 
> As always, patches are welcome that practically demonstrate the improvement.
> 
> 
> Regards,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Sat, Dec 01, 2012 at 04:58:30PM -0500, Konstantin Berlin wrote:
> > 
> > I would propose to simply revert my changes on the optimization package 
> > and prepare for a reorganization for 4.0. I understand I focused only on
> > the type of problems Gilles and myself routinely use, i .e. small size problems 
> > where the cost of the evaluation is several orders of magnitude larger than the
> > data copying. I forgot the dual case with  very large data sets. I apologize for that. 
> > 
> > When 3.1 will be out, we will have to solve this so both  cases are handled efficiently, 
> > and this would probably be implemented in 4.0.
> > 
> > Does this seems reasonable? 
> > 
> > Best regards 
> > Luc
> > 
> 
> Don't blame yourself, its hard to think of all possible uses.

Indeed. And it is also hard to get a design right without being guided by
actual use-cases.
Thus... [see below]

> I think reverting back to original interface is the best choice for 3.1.

I don't understand (cf. my other post); in the "orignal" interface (i.e.
"DifferentiableMultivariateVectorFunction"), you have exactly the same
problem as with what Luc proposed earlier: 2 separate functions, one for
the value and one for the Jacobian.
Please indicate whether my proposal is better from this point-of-view: One
function call will return an array of "value and gradient" objects, and
references to those can be accessed directly inside the optimizer's code
to get the Jacobian data, without copying.

> In the future it is worth finding and adding "standard" industry benchmarks to the tests.

... this issue came up several times, but was not able to gather the
necessary "human-power": see
  https://issues.apache.org/jira/browse/MATH-678
and
  https://issues.apache.org/jira/browse/MATH-763

> 
> Speaking of optimizers... Would it not be better to move all the getRMS, getChiSquared into
> the return class, along with all other information, like how many iterations it took, function
> evaluations, etc? That is how I find it more natural to use, so I end up creating a wrapper.
> These properties are not properties of the optimizer but are properties of the specific
> optimization evaluation of the function with the specific initial value.

As always, patches are welcome that practically demonstrate the improvement.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hi Luc.

> 
> I would propose to simply revert my changes on the optimization package 
> and prepare for a reorganization for 4.0. I understand I focused only on
> the type of problems Gilles and myself routinely use, i .e. small size problems 
> where the cost of the evaluation is several orders of magnitude larger than the
> data copying. I forgot the dual case with  very large data sets. I apologize for that.

In the context of an FLOSS project, I don't think that you should apologize
for that. We all do our best (and sometimes beyond) and the lack of full
problem coverage should certainly not rest on the usual suspects (eh,
contributors, I mean. ;-)

> 
> When 3.1 will be out, we will have to solve this so both  cases are handled efficiently, 
> and this would probably be implemented in 4.0.
> 
> Does this seems reasonable? 

At this point, no!

I really want to understand what is wrong, and improve the API on all
accounts (the issues you had, the one I raised, and the feedback from
Konstantin, plus advice from others if possible).

As a user of this package with some deadlines, I don't want to be stuck in
the middle of the road, whereas just a little effort seems required to clear
this up.


Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Konstantin Berlin <kb...@gmail.com>.

> 
> I would propose to simply revert my changes on the optimization package 
> and prepare for a reorganization for 4.0. I understand I focused only on
> the type of problems Gilles and myself routinely use, i .e. small size problems 
> where the cost of the evaluation is several orders of magnitude larger than the
> data copying. I forgot the dual case with  very large data sets. I apologize for that. 
> 
> When 3.1 will be out, we will have to solve this so both  cases are handled efficiently, 
> and this would probably be implemented in 4.0.
> 
> Does this seems reasonable? 
> 
> Best regards 
> Luc
> 

Don't blame yourself, its hard to think of all possible uses. I think reverting back to original interface is the best choice for 3.1. In the future it is worth finding and adding "standard" industry benchmarks to the tests.

Speaking of optimizers... Would it not be better to move all the getRMS, getChiSquared into the return class, along with all other information, like how many iterations it took, function evaluations, etc? That is how I find it more natural to use, so I end up creating a wrapper. These properties are not properties of the optimizer but are properties of the specific optimization evaluation of the function with the specific initial value.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Luc Maisonobe <lu...@spaceroots.org>.



Konstantin Berlin <kb...@gmail.com> a écrit :

>Hi,
>
>
>
>> Hello.
>> 
>>> 
>>> Now that I have some time, let me try to make my case clearly. First
>I want to say that this is not some attack on the
>automatic-differentation package. I love automatic-differentation and
>symbolic packages. I personally cannot compute a derivative without a
>computer for the life of me. And I am also glad we are finally starting
>to agree on something :)
>> 
>> I don't think that we disagreed about the fact that there was a
>problem with
>> the interface to the user application. [_I_ started this thread
>hinting that
>> there was a problem (at least for me as a user, based on my
>use-case).]
>> 
>> [Then there were several misunderstandings about what was the problem
>and how
>> to solve it...]
>> 
>>> 
>>> This discussion is about figuring out how an incorrect framework and
>storage affects the performance of an optimization. That is why I am so
>worried.
>> 
>> I can understand that this is an important point for your use-case.
>> There are now two vantage points (at least) on two aspects to
>consider: You
>> seem to have experience where storage matter (while I don't); you
>worry
>> about "small" objects allocation (while I don't, anymore).
>> 
>> I'm not saying that your worries do not count; quite the contrary. CM
>is not
>> my "toolbox"; it's a library based on the knowledge of its developers
>> (obviously).
>> If you bring actual use-cases (i.e. unit tests or real application
>code
>> that can be benchmarked) that will show worrying behaviour, it will
>> certainly trigger swift corrective action. [This has happened
>recently.]
>> 
>> In this area (performance), the problem is that intuition (or
>educated
>> guess, however you want to name it) is not a substitute for actual
>runs,
>> sometimes by a large amount. [When I started to use CM, I raised the
>issue
>> that a 3D vector was immutable (so that I had to create a new one
>whenever
>> its coordinates were changing). Surely this was a performance hit!
>Actually
>> it wasn't.]
>
>This is not the same case. You take a function that a user normally
>expresses as a two dimensional array or a vector, you force them to
>allocate 10K+ new objects that you then have to unwrap back into the
>structure the user would have happily supplied you in the first place.
>The second issue you missed is one of scale. The difference between
>modifying a 3 variable array and creating a copy is not large. Try
>doing this with a 10k+ vector, where you don't actually need to modify
>any of the entries but are just doing copies for the hell of it. This
>is a known critical component of an optimization and should be
>optimized for performance itself.
>
>> 
>> Again, that does not mean that you are wrong in this case. It's just
>that we
>> cannot jump and change the code every time someone comes up with what
>> amounts to "conventional wisdom". If the person comes with a patch
>that
>> readily proves the point (at least marginally through
>microbenchmarking),
>> then we all benefit.
>> 
>
>This is not "conventional wisdom", this is 60 years of research, so you
>better provide a good reason for you increased level of abstraction. I
>asked you why you would indirectly wrap a Hessian or a Jacobian in a
>much heaver class that provides no useful features for a newton
>optimizer. I don't think you gave me an answer. I do not believe that
>good OO design is to have as many abstract layers as you can think of.
>Good OO design is just like any good engineering design, its about
>having things clean, simple, and not be dependent on components that
>are not required. In addition, if you are working on a code that is
>called thousands of times, you should think really careful about memory
>performance.
>
>>> 
>>> So lets start with the basics. A Newton method must compute a
>descent step by solving a linear equation
>>> 
>>> H*p = -g, (1)
>>> 
>>> where H is the Hessian, g is the gradient, and p is the desired
>descent step. What I am about to say also holds for non-linear least
>squares method, where Hessian is approximated using the Jacobian as
>>> 
>>> H \approx J^T*J+\lambda I.
>>> 
>>> Now, if you are not solving a simple optimization problem that you
>keep giving examples for, you can easily have a Hessian be a very large
>matrix,
>>> like 1000x1000 or larger. Now, you better hope that you are storing
>H using your linear algebra framework, otherwise eq. (1) computation is
>going to take a while.
>>> This is actually a very active area of research, and that is why
>having sparse linear algebra (aren't you removing this? ;) ) and
>iterative solvers is important to a lot of people.
>> 
>> Yes we are removing it because it is buggy and _nobody_ stepped up to
>say
>> that it was important for CM users, and to help solve the problems in
>a way
>> consistent with real-life usage of such a feature.
>> As Sébastien said, you are warmly welcome to help us find a way to
>keep the
>> feature.
>
>I personally do not use sparse linear algebra. I was just pointing out
>how important computation of eq. 1 is in general. I wish I had the time
>to help :(
>
>> 
>>> 
>>> What you are proposing as good OO style 
>> 
>> This discussion has really nothing to do with "OO style", merely code
>reuse;
>> and it was my big misunderstanding (partly because of my lack of
>knowledge,
>> partly because of the lack of documentation on the targetted usages
>of
>> "DerivativeStructure", which IIUC now, are outside CM) that I
>believed
>> that the new "MultivariateDifferentiableFunction" was the way to go.
>> 
>> [Also, Luc had been moving very fast on this. And I couldn't keep up
>> although I had wondered earlier why this had to impact usage in the
>> "optimization" package.]
>> 
>>> is that if someone has a function that they want to optimize, they
>need to take what is probably already a RealMatrix or [][], and create
>1000x1000 DerivativeStructure objects, that,
>> 
>> IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If
>so, for
>> this example, using "DerivativeStructure" would lead to about a 4%
>increase
>> in storage memory.
>
>Ok 1000. I guess it's really hard to understand this
>DerivativeStructure without further documentation. You can change my
>problem to least-squares problem with a very large number of
>observables. Same problem will pop-up again.
>
>> 
>>> within the next 10 lines in the optimizer, are going to be converted
>back to a RealMatrix? Not only have you just fragmented your heap
>space, but your garbage collector
>>> is going crazy, and you are wasting an incredible amount of memory.
>> 
>> This is the kind of assertions that really needs support from actual
>code.
>> [Again, I don't claim to know better; I think that it would really be
>useful
>> to accumulate a list of "do and don't" for Java and CM, in the form
>of
>> reproducible user experience.]
>> 
>>> Now imagine if your Jacobian is actually very simple to compute but
>large, then this whole conversion back and forth actually takes much
>longer than the function evaluation.
>> 
>> We are willing to take this into account, but since I cannot see the
>> behaviour in my use-case (where the evaluation time overwhelms, by
>several
>> orders of magnitude, all such considerations), I do not have any
>incentive
>> to change what seems to be "efficient enough" (for the actually known
>> cases).
>> Again, your experience with other use-cases would be very valuable to
>> analyze the efficiency of the CM implementations and improve them, if
>> necessary.
>> 
>>> 
>>> So, why exactly are you insisting on taking this performance
>penalty?
>> 
>> It's the other way around. Until the performance penalty is proven,
>we've
>> decided that it's up to the developer willing to do the work to
>decide.
>> Again, if there is patch, it makes the matter much easier to decide
>on.
>> 
>> I admit that we decided to change the internal working of the
>optimizers, so
>> _we_ should have proved that it does not impact usability. Hence,
>again, the
>> usefulness of having a test suite of "real" applications which could
>be
>> benchmarked regularly in order to have performance regressions
>induced by
>> otherwise welcome changes.
>> [I raised that issue several times on the ML.]
>
>It's not about usability. When you tests your components you should not
>test on small problems, because any bad software will work on that.
>Create a case where you have 10K+ observations and 10K+ variable and
>see how your change effects the solution. I wish I would have time to
>do it myself, but I don't. As a user I can tell you that scalability is
>an important issue. 
>
>> 
>>> You claim that the optimizer can work better if it gets a
>DerivativeStructure, why?
>> 
>> This was (supposed to be) an improvement of the _internal_ design.
>[Several
>> weeks ago, Luc posted the problems he was encountering.]
>> 
>>> Where is the fact that you are holding a DerivativeStructure come
>into effect for a Newton method? Can you provide any literature in that
>regard? The classical optimization bottleneck, if not the actual
>function evaluation, is eq. (1). The optimization methodology is build
>on top of years of research in computational linear algebra concepts,
>and does not care how the matrices are actually computed. Efficient
>memory usage and speed is critical here because in Newton optimizations
>eq. (1) is evaluated thousands of times. The goal of the Jacobian is
>not to store derivatives it is to store a matrix of numbers that allows
>you to quadratically approximate your target function.
>> 
>> OK. Then Luc's proposal seems appropriate.
>> There would be new interfaces defining the "optimize" method
>appropriately.
>> For algorithms that need the gradient, it must be provided in an
>additional
>> argument, of type "MultivariateVectorFunction"; for those that need,
>the
>> Jacobian, the argument would be of type "MultivariateMatrixFunction".
>> Do you agree?
>> 
>
>I keep saying that there are cases when evaluating value gradient
>Hessian is faster together than as a separate function. So no, I do not
>agree. I do agree it is better than having DerivativeStructure, but I
>think it is worse than what you had in 3.0. The proposal is pretty much
>like before in 3.0, but now I need to create two classes every time I
>want to optimize a function. Why are you doing this? I don't understand
>why this change needs to happen. 

I would propose to simply revert my changes on the optimization package 
and prepare for a reorganization for 4.0. I understand I focused only on
the type of problems Gilles and myself routinely use, i .e. small size problems 
where the cost of the evaluation is several orders of magnitude larger than the
data copying. I forgot the dual case with  very large data sets. I apologize for that. 

When 3.1 will be out, we will have to solve this so both  cases are handled efficiently, 
and this would probably be implemented in 4.0.

Does this seems reasonable? 

Best regards 
Luc

>
>I don't see a problem at all. All that has to happen is that the
>function in gradient based methods overwrites the optimization
>function, such that it now takes a subclass of the function that is
>used in non-gradient method. That sub-class would require a gradient
>function to be implemented.
>
>>> 
>>> You guys are focusing on people using finite differences and trying
>to optimize a newton method to use finite differences. This is not what
>the main purpose of a newton method is. If you want a method that
>dynamically adjusts finite differences step size, you should switch to
>BOBYQA, which was designed for this case.
>> 
>> Another misunderstanding probably. [The "stepSize" discussion was
>sort of
>> a digression.]
>> 
>>> 
>>> Derivatives can be computed by a computer using a symbolic toolbox a
>priori (something that I was referring to when I accidentally said
>auto-differenation). They can also be efficiently simplified by that
>toolbox before being pasted back into you program. Auto-diff could also
>be an important tool for people if their problem is simple or they are
>not concerned with optimal efficiency . This can easily be handled by a
>wrapper and has nothing to do with Newton optimization.
>> 
>> Maybe we should also change the "NewtonSolver" (in package
>> "o.a.c.m.analysis.solvers"). [In the same way we'll do for the
>optimizer
>> with derivatives. Actually those two packages were improved in
>parallel so
>> that the interface would be very similar from a usage point of view.]
>> 
>
>Not sure what you mean.
>
>>> If you want to talk about proper OO design and internal
>simplification then you should focus on being able to pass a linear
>solver into your optimization method. This way you allow people to use
>iterative and sparse solvers when needed.
>> 
>> This is a new feature; contributions are welcome. [CM is primarily
>> developed by people who use (part of) it; if they don't need that
>feature,
>> or don't know how to implement it, or do not have the time to
>implement it,
>> it won't be implemented.]
>> 
>>> If you are concerned about people getting derivatives wrong, you can
>do what MATLAB does, which is to add an option to check the derivatives
>using finite differences when debugging. 
>> 
>> IMHO, the top-priority feature to help in debugging is to introduce
>logging
>> statements!
>
>I would go further than that. I think the user should be able to get
>back all the iterations steps back, and the associated rmsd values, etc
>if so desired. But that is sort of a wish.
>
>> 
>>> 
>>> This brings me to my second issue. There are important problems
>where computation
>>> of the derivatives combined with the actual function evaluation is
>*significantly*
>>> faster than computing each alone. I am not clear why I am hitting
>such resistance
>>> with this. [...]
>> 
>> [If you think that you were met with strong resistance, I'd suggest
>that you
>> look in the archives for the discussions about exceptions in CM...]
>> 
>> Unless I'm mistaken, CM does not prevent you to write a class that
>takes
>> advantage of combining the computations.
>
>How would I write that class? You optimizer first requests function
>value , then separately the Jacobian. It is not done at the same time.
>I could try to cache intermediately results, see if you are requesting
>a Jacobian for the same values you called my value function for, but
>that is very dirty.
>
>> 
>>> What I suggested as function interfaces was just an initial quick
>suggestion
>>> that I would be happy for anyone to change in a way that everyone
>feels positive
>>> about. I just don't want my second issue to be ignored like a
>non-issue.
>> 
>> I'm getting confused. Could you please restate what where those first
>and
>> second issue?
>
>First issue is you are creating extra layers of conversion for the user
>that is bad for performance, and there is not a single specific reason
>given for why its a good idea. Second issue is that computing value
>gradient Hessian might be faster for some functions together rather
>than in two unconnected calls.
>
>Thanks,
>Konstantin
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>For additional commands, e-mail: dev-help@commons.apache.org

-- 
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Konstantin Berlin <kb...@gmail.com>.

Hi,



> Hello.
> 
>> 
>> Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)
> 
> I don't think that we disagreed about the fact that there was a problem with
> the interface to the user application. [_I_ started this thread hinting that
> there was a problem (at least for me as a user, based on my use-case).]
> 
> [Then there were several misunderstandings about what was the problem and how
> to solve it...]
> 
>> 
>> This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.
> 
> I can understand that this is an important point for your use-case.
> There are now two vantage points (at least) on two aspects to consider: You
> seem to have experience where storage matter (while I don't); you worry
> about "small" objects allocation (while I don't, anymore).
> 
> I'm not saying that your worries do not count; quite the contrary. CM is not
> my "toolbox"; it's a library based on the knowledge of its developers
> (obviously).
> If you bring actual use-cases (i.e. unit tests or real application code
> that can be benchmarked) that will show worrying behaviour, it will
> certainly trigger swift corrective action. [This has happened recently.]
> 
> In this area (performance), the problem is that intuition (or educated
> guess, however you want to name it) is not a substitute for actual runs,
> sometimes by a large amount. [When I started to use CM, I raised the issue
> that a 3D vector was immutable (so that I had to create a new one whenever
> its coordinates were changing). Surely this was a performance hit! Actually
> it wasn't.]

This is not the same case. You take a function that a user normally expresses as a two dimensional array or a vector, you force them to allocate 10K+ new objects that you then have to unwrap back into the structure the user would have happily supplied you in the first place. The second issue you missed is one of scale. The difference between modifying a 3 variable array and creating a copy is not large. Try doing this with a 10k+ vector, where you don't actually need to modify any of the entries but are just doing copies for the hell of it. This is a known critical component of an optimization and should be optimized for performance itself.

> 
> Again, that does not mean that you are wrong in this case. It's just that we
> cannot jump and change the code every time someone comes up with what
> amounts to "conventional wisdom". If the person comes with a patch that
> readily proves the point (at least marginally through microbenchmarking),
> then we all benefit.
> 

This is not "conventional wisdom", this is 60 years of research, so you better provide a good reason for you increased level of abstraction. I asked you why you would indirectly wrap a Hessian or a Jacobian in a much heaver class that provides no useful features for a newton optimizer. I don't think you gave me an answer. I do not believe that good OO design is to have as many abstract layers as you can think of. Good OO design is just like any good engineering design, its about having things clean, simple, and not be dependent on components that are not required. In addition, if you are working on a code that is called thousands of times, you should think really careful about memory performance.

>> 
>> So lets start with the basics. A Newton method must compute a descent step by solving a linear equation
>> 
>> H*p = -g, (1)
>> 
>> where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as
>> 
>> H \approx J^T*J+\lambda I.
>> 
>> Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
>> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
>> This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
> 
> Yes we are removing it because it is buggy and _nobody_ stepped up to say
> that it was important for CM users, and to help solve the problems in a way
> consistent with real-life usage of such a feature.
> As Sébastien said, you are warmly welcome to help us find a way to keep the
> feature.

I personally do not use sparse linear algebra. I was just pointing out how important computation of eq. 1 is in general. I wish I had the time to help :(

> 
>> 
>> What you are proposing as good OO style 
> 
> This discussion has really nothing to do with "OO style", merely code reuse;
> and it was my big misunderstanding (partly because of my lack of knowledge,
> partly because of the lack of documentation on the targetted usages of
> "DerivativeStructure", which IIUC now, are outside CM) that I believed
> that the new "MultivariateDifferentiableFunction" was the way to go.
> 
> [Also, Luc had been moving very fast on this. And I couldn't keep up
> although I had wondered earlier why this had to impact usage in the
> "optimization" package.]
> 
>> is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> 
> IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for
> this example, using "DerivativeStructure" would lead to about a 4% increase
> in storage memory.

Ok 1000. I guess it's really hard to understand this DerivativeStructure without further documentation. You can change my problem to least-squares problem with a very large number of observables. Same problem will pop-up again.

> 
>> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
>> is going crazy, and you are wasting an incredible amount of memory.
> 
> This is the kind of assertions that really needs support from actual code.
> [Again, I don't claim to know better; I think that it would really be useful
> to accumulate a list of "do and don't" for Java and CM, in the form of
> reproducible user experience.]
> 
>> Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.
> 
> We are willing to take this into account, but since I cannot see the
> behaviour in my use-case (where the evaluation time overwhelms, by several
> orders of magnitude, all such considerations), I do not have any incentive
> to change what seems to be "efficient enough" (for the actually known
> cases).
> Again, your experience with other use-cases would be very valuable to
> analyze the efficiency of the CM implementations and improve them, if
> necessary.
> 
>> 
>> So, why exactly are you insisting on taking this performance penalty?
> 
> It's the other way around. Until the performance penalty is proven, we've
> decided that it's up to the developer willing to do the work to decide.
> Again, if there is patch, it makes the matter much easier to decide on.
> 
> I admit that we decided to change the internal working of the optimizers, so
> _we_ should have proved that it does not impact usability. Hence, again, the
> usefulness of having a test suite of "real" applications which could be
> benchmarked regularly in order to have performance regressions induced by
> otherwise welcome changes.
> [I raised that issue several times on the ML.]

It's not about usability. When you tests your components you should not test on small problems, because any bad software will work on that. Create a case where you have 10K+ observations and 10K+ variable and see how your change effects the solution. I wish I would have time to do it myself, but I don't. As a user I can tell you that scalability is an important issue. 

> 
>> You claim that the optimizer can work better if it gets a DerivativeStructure, why?
> 
> This was (supposed to be) an improvement of the _internal_ design. [Several
> weeks ago, Luc posted the problems he was encountering.]
> 
>> Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.
> 
> OK. Then Luc's proposal seems appropriate.
> There would be new interfaces defining the "optimize" method appropriately.
> For algorithms that need the gradient, it must be provided in an additional
> argument, of type "MultivariateVectorFunction"; for those that need, the
> Jacobian, the argument would be of type "MultivariateMatrixFunction".
> Do you agree?
> 

I keep saying that there are cases when evaluating value gradient Hessian is faster together than as a separate function. So no, I do not agree. I do agree it is better than having DerivativeStructure, but I think it is worse than what you had in 3.0. The proposal is pretty much like before in 3.0, but now I need to create two classes every time I want to optimize a function. Why are you doing this? I don't understand why this change needs to happen. 

I don't see a problem at all. All that has to happen is that the function in gradient based methods overwrites the optimization function, such that it now takes a subclass of the function that is used in non-gradient method. That sub-class would require a gradient function to be implemented.

>> 
>> You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.
> 
> Another misunderstanding probably. [The "stepSize" discussion was sort of
> a digression.]
> 
>> 
>> Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization.
> 
> Maybe we should also change the "NewtonSolver" (in package
> "o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer
> with derivatives. Actually those two packages were improved in parallel so
> that the interface would be very similar from a usage point of view.]
> 

Not sure what you mean.

>> If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed.
> 
> This is a new feature; contributions are welcome. [CM is primarily
> developed by people who use (part of) it; if they don't need that feature,
> or don't know how to implement it, or do not have the time to implement it,
> it won't be implemented.]
> 
>> If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging. 
> 
> IMHO, the top-priority feature to help in debugging is to introduce logging
> statements!

I would go further than that. I think the user should be able to get back all the iterations steps back, and the associated rmsd values, etc if so desired. But that is sort of a wish.

> 
>> 
>> This brings me to my second issue. There are important problems where computation
>> of the derivatives combined with the actual function evaluation is *significantly*
>> faster than computing each alone. I am not clear why I am hitting such resistance
>> with this. [...]
> 
> [If you think that you were met with strong resistance, I'd suggest that you
> look in the archives for the discussions about exceptions in CM...]
> 
> Unless I'm mistaken, CM does not prevent you to write a class that takes
> advantage of combining the computations.

How would I write that class? You optimizer first requests function value , then separately the Jacobian. It is not done at the same time. I could try to cache intermediately results, see if you are requesting a Jacobian for the same values you called my value function for, but that is very dirty.

> 
>> What I suggested as function interfaces was just an initial quick suggestion
>> that I would be happy for anyone to change in a way that everyone feels positive
>> about. I just don't want my second issue to be ignored like a non-issue.
> 
> I'm getting confused. Could you please restate what where those first and
> second issue?

First issue is you are creating extra layers of conversion for the user that is bad for performance, and there is not a single specific reason given for why its a good idea. Second issue is that computing value gradient Hessian might be faster for some functions together rather than in two unconnected calls.

Thanks,
Konstantin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hello.

> 
> Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)

I don't think that we disagreed about the fact that there was a problem with
the interface to the user application. [_I_ started this thread hinting that
there was a problem (at least for me as a user, based on my use-case).]

[Then there were several misunderstandings about what was the problem and how
to solve it...]

> 
> This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.

I can understand that this is an important point for your use-case.
There are now two vantage points (at least) on two aspects to consider: You
seem to have experience where storage matter (while I don't); you worry
about "small" objects allocation (while I don't, anymore).

I'm not saying that your worries do not count; quite the contrary. CM is not
my "toolbox"; it's a library based on the knowledge of its developers
(obviously).
If you bring actual use-cases (i.e. unit tests or real application code
that can be benchmarked) that will show worrying behaviour, it will
certainly trigger swift corrective action. [This has happened recently.]

In this area (performance), the problem is that intuition (or educated
guess, however you want to name it) is not a substitute for actual runs,
sometimes by a large amount. [When I started to use CM, I raised the issue
that a 3D vector was immutable (so that I had to create a new one whenever
its coordinates were changing). Surely this was a performance hit! Actually
it wasn't.]

Again, that does not mean that you are wrong in this case. It's just that we
cannot jump and change the code every time someone comes up with what
amounts to "conventional wisdom". If the person comes with a patch that
readily proves the point (at least marginally through microbenchmarking),
then we all benefit.

> 
> So lets start with the basics. A Newton method must compute a descent step by solving a linear equation
> 
> H*p = -g, (1)
> 
> where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as
> 
> H \approx J^T*J+\lambda I.
> 
> Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
> This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.

Yes we are removing it because it is buggy and _nobody_ stepped up to say
that it was important for CM users, and to help solve the problems in a way
consistent with real-life usage of such a feature.
As Sébastien said, you are warmly welcome to help us find a way to keep the
feature.

> 
> What you are proposing as good OO style 

This discussion has really nothing to do with "OO style", merely code reuse;
and it was my big misunderstanding (partly because of my lack of knowledge,
partly because of the lack of documentation on the targetted usages of
"DerivativeStructure", which IIUC now, are outside CM) that I believed
that the new "MultivariateDifferentiableFunction" was the way to go.

[Also, Luc had been moving very fast on this. And I couldn't keep up
although I had wondered earlier why this had to impact usage in the
"optimization" package.]

> is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,

IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for
this example, using "DerivativeStructure" would lead to about a 4% increase
in storage memory.

> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
> is going crazy, and you are wasting an incredible amount of memory.

This is the kind of assertions that really needs support from actual code.
[Again, I don't claim to know better; I think that it would really be useful
to accumulate a list of "do and don't" for Java and CM, in the form of
reproducible user experience.]

> Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.

We are willing to take this into account, but since I cannot see the
behaviour in my use-case (where the evaluation time overwhelms, by several
orders of magnitude, all such considerations), I do not have any incentive
to change what seems to be "efficient enough" (for the actually known
cases).
Again, your experience with other use-cases would be very valuable to
analyze the efficiency of the CM implementations and improve them, if
necessary.

> 
> So, why exactly are you insisting on taking this performance penalty?

It's the other way around. Until the performance penalty is proven, we've
decided that it's up to the developer willing to do the work to decide.
Again, if there is patch, it makes the matter much easier to decide on.

I admit that we decided to change the internal working of the optimizers, so
_we_ should have proved that it does not impact usability. Hence, again, the
usefulness of having a test suite of "real" applications which could be
benchmarked regularly in order to have performance regressions induced by
otherwise welcome changes.
[I raised that issue several times on the ML.]

> You claim that the optimizer can work better if it gets a DerivativeStructure, why?

This was (supposed to be) an improvement of the _internal_ design. [Several
weeks ago, Luc posted the problems he was encountering.]

> Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.

OK. Then Luc's proposal seems appropriate.
There would be new interfaces defining the "optimize" method appropriately.
For algorithms that need the gradient, it must be provided in an additional
argument, of type "MultivariateVectorFunction"; for those that need, the
Jacobian, the argument would be of type "MultivariateMatrixFunction".
Do you agree?

> 
> You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.

Another misunderstanding probably. [The "stepSize" discussion was sort of
a digression.]

> 
> Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization.

Maybe we should also change the "NewtonSolver" (in package
"o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer
with derivatives. Actually those two packages were improved in parallel so
that the interface would be very similar from a usage point of view.]

> If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed.

This is a new feature; contributions are welcome. [CM is primarily
developed by people who use (part of) it; if they don't need that feature,
or don't know how to implement it, or do not have the time to implement it,
it won't be implemented.]

> If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging. 

IMHO, the top-priority feature to help in debugging is to introduce logging
statments!

> 
> This brings me to my second issue. There are important problems where computation
> of the derivatives combined with the actual function evaluation is *significantly*
> faster than computing each alone. I am not clear why I am hitting such resistance
> with this. [...]

[If you think that you were met with strong resistance, I'd suggest that you
look in the archives for the discussions about exceptions in CM...]

Unless I'm mistaken, CM does not prevent you to write a class that takes
advantage of combining the computations.

> What I suggested as function interfaces was just an initial quick suggestion
> that I would be happy for anyone to change in a way that everyone feels positive
> about. I just don't want my second issue to be ignored like a non-issue.

I'm getting confused. Could you please restate what where those first and
second issue?

Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")

Posted by Thomas Neidhart <th...@gmail.com>.

On 12/01/2012 01:42 AM, Konstantin Berlin wrote:
> Hi,
> 
> Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)
> 
> This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.
> 
> So lets start with the basics. A Newton method must compute a descent step by solving a linear equation
> 
> H*p = -g, (1)
> 
> where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as
> 
> H \approx J^T*J+\lambda I.
> 
> Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
> This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
> 
> What you are proposing as good OO style is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
> is going crazy, and you are wasting an incredible amount of memory. Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.
> 
> So, why exactly are you insisting on taking this performance penalty? You claim that the optimizer can work better if it gets a DerivativeStructure, why? Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.
> 
> You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.
> 
> Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization. If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed. If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging. 
> 
> This brings me to my second issue. There are important problems where computation of the derivatives combined with the actual function evaluation is *significantly* faster than computing each alone. I am not clear why I am hitting such resistance with this. For example, you are doing some sort of a simulation, which can be trivially adjusted in the end to give a derivative or a function value. A very real example of this is a Fast Multipole Method, which takes time to compute a series expansion of the function, but the resulting series expansion can be analytically differentiated.
> 
> What I suggested as function interfaces was just an initial quick suggestion that I would be happy for anyone to change in a way that everyone feels positive about. I just don't want my second issue to be ignored like a non-issue.

I think the discussion on these things just started, and imho the guys
here are really open to discuss changes and even admit mistakes or bad
design, it just takes time to digest feedback.

Also the current design may be based on wrong assumptions (e.g. for too
specific use-cases), whereas the general case is handled in a
sub-optimal way, so we have to adapt.

just my 2 cents.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org