You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Phil Steitz <ph...@gmail.com> on 2012/12/31 18:30:05 UTC

[math] [linear] immutability

If we stick to

0) algebraic objects are immutable
1) algorithms defined using algebraic concepts should be implemented
using algebraic objects

we end up having to create lots of algebraic objects and copy lots
of data, which creates memory and performance problems.  Adding
support for sparse objects helps here; but the problem does not go
away.  To get high performance and efficient memory utilization, you
need to relax one or both of these constraints.  Within the linear
package itself, some of our implementations basically throw out 1),
grabbing data and operating on double arrays rather than using
algebraic operations.  I think we need to seriously consider ways to
relax 0) such as Konstantin's idea of the InPlace interface,
Sebastien's suggestion to use visitors or Mahout's visitors / view
approaches.  Do others agree?  Please weigh in on the options below:

0)  Start, with Konstantin's help, by fleshing out the InPlace
matrix / vector interface
1)  Integrate Mahout code as part of a wholesale refactoring of the
linear package
2)  Extend use of the visitor pattern to perform mutations
"in-place" (similar to 0) in effect)

Or provide other / better options.  

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] [linear] immutability

Posted by Sébastien Brisard <se...@m4x.org>.

Hello,

2012/12/31 Phil Steitz <ph...@gmail.com>

> If we stick to
>
> 0) algebraic objects are immutable
> 1) algorithms defined using algebraic concepts should be implemented
> using algebraic objects
>
> we end up having to create lots of algebraic objects and copy lots
> of data, which creates memory and performance problems.  Adding
> support for sparse objects helps here; but the problem does not go
> away.  To get high performance and efficient memory utilization, you
> need to relax one or both of these constraints.  Within the linear
> package itself, some of our implementations basically throw out 1),
> grabbing data and operating on double arrays rather than using
> algebraic operations.  I think we need to seriously consider ways to
> relax 0) such as Konstantin's idea of the InPlace interface,
> Sebastien's suggestion to use visitors or Mahout's visitors / view
> approaches.  Do others agree?  Please weigh in on the options below:
>
> 0)  Start, with Konstantin's help, by fleshing out the InPlace
> matrix / vector interface
>

I have already said that I totally disagree with this solution. This would
indeed be very unsafe. We would even run the risk to faslify our own
algorithms. Indeed, such simple statements as x.add(y) would no longer
garantee that x is unchanged (depending on which implementation of
RealVector --InPlace or not-- would be passed to the algorithm.

1)  Integrate Mahout code as part of a wholesale refactoring of the
> linear package
> 2)  Extend use of the visitor pattern to perform mutations
> "in-place" (similar to 0) in effect)
>
> Or provide other / better options.
>
>

I don't like the mess it would generate in the matrix/vector interfaces,
but I'd much (much, much) rather we implement separate in-place operation,
such as addToSelf and the likes.

Best regards,

Sébastien


> Phil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] [linear] immutability

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Jan 1, 2013 at 11:17 AM, Sébastien Brisard <
sebastien.brisard@m4x.org> wrote:

> > Please mention that when I first mentioned in-place operations, I didn't
> have speed in mind, but really memory.
>


> I think we would not gain much speedwise, as Java has become very good at
> allocating objects (this would be true of large problems, where typically a
> few big objects would be allocated at each iteration. The conclusion would
> probably be different with many small objects to be allocated at each
> iteration).
>

Allocation is not the problem.  The problem is memory bandwidth due to the
copies that are a side effect of the allocation.

Re: [math] [linear] immutability

Posted by Sébastien Brisard <se...@m4x.org>.

Hi Gilles,


2013/1/1 Gilles Sadowski <gi...@harfang.homelinux.org>

> Hi.
>
> > > If we stick to
> > >
> > > 0) algebraic objects are immutable
> > > 1) algorithms defined using algebraic concepts should be implemented
> > > using algebraic objects
> > >
> > > ...
> > > 0)  Start, with Konstantin's help, by fleshing out the InPlace
> > > matrix / vector interface
> > > 1)  Integrate Mahout code as part of a wholesale refactoring of the
> > > linear package
>
> What do you mean by this?
> Copy/paste or create a dependency? Something else?
>
> > > 2)  Extend use of the visitor pattern to perform mutations
> > > "in-place" (similar to 0) in effect)
> > >
>
> As suggested in a previous post:
>
> 3) a) Define a new "minimal matrix" interface, and create immutable
>       implementations.
>    b) Benchmark critical methods (entry access, iteration, add, multiply,
>       etc.)
>    c) Quantify the efficiency gain of in-place operations and only when
> this
>       information is available decide whether the gain is worth the price.
>       [Even if in-place operations are faster in a single thread context,
> it
>       is not sure that immutability would not change that in a multi-thread
>       implementation. Trying to outperform multi-threaded code with
> in-place
>       operations is a dead end.]
>
> Please mention that when I first mentioned in-place operations, I didn't
have speed in mind, but really memory.
I think we would not gain much speedwise, as Java has become very good at
allocating objects (this would be true of large problems, where typically a
few big objects would be allocated at each iteration. The conclusion would
probably be different with many small objects to be allocated at each
iteration).


> Before embarking on any of this, please identify the rationale: Is there
> _one_ identified problem that would require urgent action? This discussion
> about clean-up/improvement/simplification of the CM matrix implementations
> has been going on for months, and we should not start a "new" discussion
> without referring to what has been recorded by Sébastien on JIRA.
>
> I agree with you, of course ;-)
As for use cases: I'm simulating mechanical experiments on microstructures
which are represented as 3D images. The images I'm dealing with are
typically 128x128x128, with 6 dofs per voxel, but my aim is 1024x1024x1024,
even 2048x2048x2048. For these kind of problems, the main issue is memory
(_followed_ by speed).

>
> Regards,
> Gilles
>
> Best regards,
Sébastien


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] [linear] immutability

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Jan 1, 2013 at 10:25 AM, Phil Steitz <ph...@gmail.com> wrote:

> Agreed we should keep the discussion concrete.  Sebastien and Luc
> have both mentioned specific examples where the overhead of matrix
> data copy and storage creates practical problems.  Konstantin
> mentioned another (Gaussian elimination) which is sort of humorous
> because we have in fact effectively implemented that already,
> embedded in the LU decomp class - but to do it, we took the approach
> that I mentioned above, which is to abandon the linear algebraic
> objects and operate directly on double arrays.
>

And frankly, that can be disastrous for performance as well.  As Konstantin
mentioned, it is critical to have BLAS operations exposed to get good
performance.  Level 2 and 3 operations are particularly important.

Phil's allusions to collections is particularly apt.  There are immutable
collections (see guava) and they are very handy.  And there are mutable
collections and they are very handy.  And there are multi-thread
performance friendly mutable collections (see ConcurrentHashMap).  They all
share a fairly simple API and they have some very simple abstract class
implementation helpers.

Re: [math] [linear] immutability

Posted by Phil Steitz <ph...@gmail.com>.

On 12/31/12 5:07 PM, Gilles Sadowski wrote:
> Hi.
>
>>> If we stick to
>>>
>>> 0) algebraic objects are immutable
>>> 1) algorithms defined using algebraic concepts should be implemented
>>> using algebraic objects
>>>
>>> ...
>>> 0)  Start, with Konstantin's help, by fleshing out the InPlace
>>> matrix / vector interface
>>> 1)  Integrate Mahout code as part of a wholesale refactoring of the
>>> linear package
> What do you mean by this?
> Copy/paste or create a dependency? Something else?

I was thinking along the lines of copy / adapt; but Ted's response
and some perusal of the code has disabused me of the idea that this
would be feasible.  I think realistically all we could do would be
to borrow some ideas.
>
>>> 2)  Extend use of the visitor pattern to perform mutations
>>> "in-place" (similar to 0) in effect)
>>>
> As suggested in a previous post:
>
> 3) a) Define a new "minimal matrix" interface, and create immutable
>       implementations.
>    b) Benchmark critical methods (entry access, iteration, add, multiply,
>       etc.)
>    c) Quantify the efficiency gain of in-place operations and only when this
>       information is available decide whether the gain is worth the price.
>       [Even if in-place operations are faster in a single thread context, it
>       is not sure that immutability would not change that in a multi-thread
>       implementation. Trying to outperform multi-threaded code with in-place
>       operations is a dead end.]
>
> Before embarking on any of this, please identify the rationale: Is there
> _one_ identified problem that would require urgent action? This discussion
> about clean-up/improvement/simplification of the CM matrix implementations
> has been going on for months, and we should not start a "new" discussion
> without referring to what has been recorded by Sébastien on JIRA.

Agreed we should keep the discussion concrete.  Sebastien and Luc
have both mentioned specific examples where the overhead of matrix
data copy and storage creates practical problems.  Konstantin
mentioned another (Gaussian elimination) which is sort of humorous
because we have in fact effectively implemented that already,
embedded in the LU decomp class - but to do it, we took the approach
that I mentioned above, which is to abandon the linear algebraic
objects and operate directly on double arrays.   You can see this
throughout the linear package itself.  This is partly an artifact of
the fact that most of the impls there are translated from fortran or
taken from JAMA, but partly it is just following 50+ yrs of
numerical linear algebra experience, which favors in-place
operations in iterative operations.  An interesting exercise
illustrating why you really need in-place operations for performance
in these algorithms would be to see how far you could get
implementing, say LU Decomp, using algebraic operations on immutable
vectors and matrices only.  The code might be nice to read; but I
bet dollars to doughnuts performance would be terrible and it would
require a lot more memory to handle the same size problems.

I agree with you we need to keep things concrete and I also agree we
should think about stripping down the RealMatrix interface; but I
think we should pay attention to the practical concerns and
approaches suggested by Ted, Konstantin and others.  I would like to
get some more concrete proposals on the table for either the InPlace
interface or the use of views and visitors.  Examples from other
projects / languages / algorithms would be helpful here.  The
alternative is to agree to follow what is the de facto standard
internally in the linear package now, which is essentially to use
the algebraic objects just as value / data-transfer objects and work
with (mutable) arrays when implementing iterative algorithms.  I am
actually OK with that approach as well; but to maximize code reuse,
some algebraic operations will in this case have to be exposed as
operations on primitive or FieldElement arrays.

If you want to experiment with approaches for parallelizing some of
our algorithms, I would say go for it.  I doubt that mutability will
really end up playing a significant role there.  While it is in
general true that immutable objects make thread-safety a little
easier, it is not obvious to me that the real problems in
parallelizing the algorithms are made any easier by making the data
containers immutable (in other words, I would not expect protecting
shards from cross-thread access to be a real issue).  And getting
maximum performance from each worker thread ends up facing the same
problems.

Phil

>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] [linear] immutability

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hi.

> [...]
> The virtue here is that we didn't have to discuss these matters much.  We
> could just say "Sounds like a great idea, can you build a prototype to
> demonstrate your point?" to any bright spark who came along.

Special thanks for also mentioning this point.
People who think that they have the solution should propose actual code!


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] [linear] immutability

Posted by Ted Dunning <te...@gmail.com>.

My apologies, but I have totally lost track of who said what because too
many comments have enormous lines immediately adjacent to responses.

On Tue, Jan 1, 2013 at 11:39 AM, Somebody <so...@body.org> wrote:

> I thought that maybe it was due to the underlying
> (dynamic) data structure for sparse vectors/matrices
> (OpenIntToDoubleHashMap), while typical storage schemes (compressed row,
> skyline) might be more efficient (since only arrays are used), but do not
> lend themselves to changes of the value of initially null coefficients.
> That's why I was suggesting immutable matrices as a start, but what I
> really meant was matrices where the null coefficients are constant.
>
> Please note that your objection does not hold with iterative linear solvers
> (conjugate gradient, ...), so immutable matrices might still be
> interesting.
>

One of the benefits of making it easy to extend matrices can be found in
Mahout (which I should emphasize is probably only a source of ideas, not a
perfect paragon of perfection by any means).

As a result of easy extensibility, we have in mahout two kinds of sparse
vectors and many kinds of sparse matrices.

One kind of sparse vector uses an open hash table (RandomAccessHashTable).
 It works well for mutable situations, but is a bit more memory hungry than
might be liked.

The other kind is implemented using an array of indexes and an array of
doubles.  It can be read randomly with log n worst case performance or log
log n performance if the indices are well distributed.  It is nearly
immutable in that after a sequence of mutations, it requires substantial
time and possibly memory to merge itself back together for more read
operations.  What it does phenomenally well, however, is support iteration
with little memory overhead.

As far as matrices are concerned, we have a dense matrix backed by a single
indexed double[].  We have a row sparse matrix that has rows that are
either kind of sparse vectors.  We have block sparse matrices that has row
patches that are matrices which need not exist, but are created lazily if
written to.  We have memory mapped dense matrices.  We have a memory mapped
dense matrix that maps several regions of large files together to form a
single matrix (since mmap only supports 2GB files).

Some of these storage forms are in Mahout math.  Some are in applications.

The virtue here is that we didn't have to discuss these matters much.  We
could just say "Sounds like a great idea, can you build a prototype to
demonstrate your point?" to any bright spark who came along.  Many
prototypes were built and several were incorporated.

So the impact of a simple core API (with linear algebra, mutable operations
and OK, but not great visitor patterns) and associated abstract classes and
abstract tests was as much social as technical.  The social virtues have,
in turn, led to technical virtues.

Re: [math] [linear] immutability

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

> [...]
> 
> > P.S. I would also be interested if people think Vector should be made into
> > a special case of a Matrix.
> 
> 
> That's an interesting idea.

It's something I've been wondering about for a long time too.
CM's "RealVector" is a mix of
 * list of "double"s
 * Cartesian vector
 * single row/column matrix

The last one implies additional methods ("preMultiply", "operate") which we
could drop.
The second should be defined in the "geometry" package and clearly represent
the notion of Cartesian vector.
The first use could be supported in "MathArrays". And there could be
converters between those classes where they make sense.

> [...]

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] [linear] immutability

Posted by Sébastien Brisard <se...@m4x.org>.

Hi Konstantin,


2013/1/1 Konstantin Berlin <kb...@gmail.com>

> Here are my 2 cents:
>
> The rationale for redesign of the RealMatrix hierarchy is because a large
> amount of numerical packages depend (or should depend) on a linear algebra
> package for efficient implementation. However, as it currently stands, it
> is not possible (or unnecessarily burdensome) to create a user defined
> class that would have only the necessary and sufficient properties to
> perform such a task.
>
> That is actually a good point. I agree that CM should make it easy to
interface with well established native libraries, even if default
implementations provided in the core CM library should be pure java, as
stated in the "Guiding principles". There are many places where this should
be kept in mind, besides linear algebra. FFT is another example (I've been
writing a JNI wrapper to FFTW, but it does not blends well in CM).


> In my view immutability cannot be a property of the base matrix class, for
> the same reason as Collection classes cannot be immutable. Since a large
> amount of matrix algorithms are iterative in nature, this would make vast
> majority of algorithms intractable computationally. Just look at basic
> Gaussian elimination. Assuming you provide a vector operation for the
> inner-most loop in the API, you will still force a copy during each
> iteration of the two outermost loops. In total, this would force O(N^4)
> copies, while the actual algorithm is O(N^3). You can add Gaussian
> elimination, which is a bad idea, because you will never be able to
> implement ever algorithm in existence. In addition you have no forced your
> Gaussian elimination class to assume the most general case, and hence you
> force an O(N^4) implementation of the algorithm, since you cannot assume
> that the input matrix class has in place operations.
>
> Immutability is a very nice property, but only works if your class is
> described by a small number of member classes. This is not the case here.
> If immutability is desired for some applications, a defensive copy can be
> made. I completely disagree with your statement about how multi-threaded
> code. Please provide an example. As a note, the Java Collections package is
> a good example  of a similar problem.
>
> My understanding is that we try to enforce immutability in CM as much as
possible, when it does not damage the performances (memory- or speed-wise).
As you point out, immutability is probably not a good idea (!) in linear
algebra packages. I think I was the one who raised this issue of immutable
matrices in a previous thread. What I had in mind was sparse matrices.
Indeed, besides the bugs we have discovered in the last twelve months
(which led to their deprecation), Gilles mentioned the poor performances of
our implementations. I thought that maybe it was due to the underlying
(dynamic) data structure for sparse vectors/matrices
(OpenIntToDoubleHashMap), while typical storage schemes (compressed row,
skyline) might be more efficient (since only arrays are used), but do not
lend themselves to changes of the value of initially null coefficients.
That's why I was suggesting immutable matrices as a start, but what I
really meant was matrices where the null coefficients are constant.

Please note that your objection does not hold with iterative linear solvers
(conjugate gradient, ...), so immutable matrices might still be interesting.

I see issue MATH-765, however I don't think it's possible to do anything
> until there is a specific proposal for an interface, where we can add
> comments.
>
> I didn't understand that we were going for a full redisign of the linear
package. One feature I would really love to implement is views. Also,
please note that Ted mentioned a very important point: the need for
standardized tests. I've done that for RealVectors (all implementations are
unit tested using the same abstract test suite). It should also be done
with matrices... but it's a long, tedious refactoring work!!!


> P.S. I would also be interested if people think Vector should be made into
> a special case of a Matrix.


That's an interesting idea.


> Also I think the RealMatrix interface should not be the starting`
> interface, but rather a midpoint in the hierarchy. A merging point, where
> all the interfaces needed for the optimization package, and other packages,
> come together to form something more general. While a lot of functionality
> in the matrix class, like visitors etc, are needed for implementing norms
> and other things, they are not needed by the iterative optimizers.
>

I agree. It's also true of iterative linear solvers (which should not be
discarded from this thread!!!), which only require a RealLinearOperator.

Best regards,
Sébastien


>
> On Dec 31, 2012, at 8:07 PM, Gilles Sadowski <gi...@harfang.homelinux.org>
> wrote:
>
> > Hi.
> >
> >>> If we stick to
> >>>
> >>> 0) algebraic objects are immutable
> >>> 1) algorithms defined using algebraic concepts should be implemented
> >>> using algebraic objects
> >>>
> >>> ...
> >>> 0)  Start, with Konstantin's help, by fleshing out the InPlace
> >>> matrix / vector interface
> >>> 1)  Integrate Mahout code as part of a wholesale refactoring of the
> >>> linear package
> >
> > What do you mean by this?
> > Copy/paste or create a dependency? Something else?
> >
> >>> 2)  Extend use of the visitor pattern to perform mutations
> >>> "in-place" (similar to 0) in effect)
> >>>
> >
> > As suggested in a previous post:
> >
> > 3) a) Define a new "minimal matrix" interface, and create immutable
> >      implementations.
> >   b) Benchmark critical methods (entry access, iteration, add, multiply,
> >      etc.)
> >   c) Quantify the efficiency gain of in-place operations and only when
> this
> >      information is available decide whether the gain is worth the price.
> >      [Even if in-place operations are faster in a single thread context,
> it
> >      is not sure that immutability would not change that in a
> multi-thread
> >      implementation. Trying to outperform multi-threaded code with
> in-place
> >      operations is a dead end.]
> >
> > Before embarking on any of this, please identify the rationale: Is there
> > _one_ identified problem that would require urgent action? This
> discussion
> > about clean-up/improvement/simplification of the CM matrix
> implementations
> > has been going on for months, and we should not start a "new" discussion
> > without referring to what has been recorded by Sébastien on JIRA.
> >
> >
> > Regards,
> > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] [linear] immutability

Posted by Konstantin Berlin <kb...@gmail.com>.

Here are my 2 cents:

The rationale for redesign of the RealMatrix hierarchy is because a large amount of numerical packages depend (or should depend) on a linear algebra package for efficient implementation. However, as it currently stands, it is not possible (or unnecessarily burdensome) to create a user defined class that would have only the necessary and sufficient properties to perform such a task.

In my view immutability cannot be a property of the base matrix class, for the same reason as Collection classes cannot be immutable. Since a large amount of matrix algorithms are iterative in nature, this would make vast majority of algorithms intractable computationally. Just look at basic Gaussian elimination. Assuming you provide a vector operation for the inner-most loop in the API, you will still force a copy during each iteration of the two outermost loops. In total, this would force O(N^4) copies, while the actual algorithm is O(N^3). You can add Gaussian elimination, which is a bad idea, because you will never be able to implement ever algorithm in existence. In addition you have no forced your Gaussian elimination class to assume the most general case, and hence you force an O(N^4) implementation of the algorithm, since you cannot assume that the input matrix class has in place operations.

Immutability is a very nice property, but only works if your class is described by a small number of member classes. This is not the case here.
If immutability is desired for some applications, a defensive copy can be made. I completely disagree with your statement about how multi-threaded code. Please provide an example. As a note, the Java Collections package is a good example  of a similar problem.

I see issue MATH-765, however I don't think it's possible to do anything until there is a specific proposal for an interface, where we can add comments.

P.S. I would also be interested if people think Vector should be made into a special case of a Matrix. Also I think the RealMatrix interface should not be the starting` interface, but rather a midpoint in the hierarchy. A merging point, where all the interfaces needed for the optimization package, and other packages, come together to form something more general. While a lot of functionality in the matrix class, like visitors etc, are needed for implementing norms and other things, they are not needed by the iterative optimizers.

On Dec 31, 2012, at 8:07 PM, Gilles Sadowski <gi...@harfang.homelinux.org> wrote:

> Hi.
> 
>>> If we stick to
>>> 
>>> 0) algebraic objects are immutable
>>> 1) algorithms defined using algebraic concepts should be implemented
>>> using algebraic objects
>>> 
>>> ...
>>> 0)  Start, with Konstantin's help, by fleshing out the InPlace
>>> matrix / vector interface
>>> 1)  Integrate Mahout code as part of a wholesale refactoring of the
>>> linear package
> 
> What do you mean by this?
> Copy/paste or create a dependency? Something else?
> 
>>> 2)  Extend use of the visitor pattern to perform mutations
>>> "in-place" (similar to 0) in effect)
>>> 
> 
> As suggested in a previous post:
> 
> 3) a) Define a new "minimal matrix" interface, and create immutable
>      implementations.
>   b) Benchmark critical methods (entry access, iteration, add, multiply,
>      etc.)
>   c) Quantify the efficiency gain of in-place operations and only when this
>      information is available decide whether the gain is worth the price.
>      [Even if in-place operations are faster in a single thread context, it
>      is not sure that immutability would not change that in a multi-thread
>      implementation. Trying to outperform multi-threaded code with in-place
>      operations is a dead end.]
> 
> Before embarking on any of this, please identify the rationale: Is there
> _one_ identified problem that would require urgent action? This discussion
> about clean-up/improvement/simplification of the CM matrix implementations
> has been going on for months, and we should not start a "new" discussion
> without referring to what has been recorded by Sébastien on JIRA.
> 
> 
> Regards,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] [linear] immutability

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hi.

> > If we stick to
> >
> > 0) algebraic objects are immutable
> > 1) algorithms defined using algebraic concepts should be implemented
> > using algebraic objects
> >
> > ...
> > 0)  Start, with Konstantin's help, by fleshing out the InPlace
> > matrix / vector interface
> > 1)  Integrate Mahout code as part of a wholesale refactoring of the
> > linear package

What do you mean by this?
Copy/paste or create a dependency? Something else?

> > 2)  Extend use of the visitor pattern to perform mutations
> > "in-place" (similar to 0) in effect)
> >

As suggested in a previous post:

3) a) Define a new "minimal matrix" interface, and create immutable
      implementations.
   b) Benchmark critical methods (entry access, iteration, add, multiply,
      etc.)
   c) Quantify the efficiency gain of in-place operations and only when this
      information is available decide whether the gain is worth the price.
      [Even if in-place operations are faster in a single thread context, it
      is not sure that immutability would not change that in a multi-thread
      implementation. Trying to outperform multi-threaded code with in-place
      operations is a dead end.]

Before embarking on any of this, please identify the rationale: Is there
_one_ identified problem that would require urgent action? This discussion
about clean-up/improvement/simplification of the CM matrix implementations
has been going on for months, and we should not start a "new" discussion
without referring to what has been recorded by Sébastien on JIRA.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] [linear] immutability

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Dec 31, 2012 at 9:30 AM, Phil Steitz <ph...@gmail.com> wrote:

> If we stick to
>
> 0) algebraic objects are immutable
> 1) algorithms defined using algebraic concepts should be implemented
> using algebraic objects
>
> ...
> 0)  Start, with Konstantin's help, by fleshing out the InPlace
> matrix / vector interface
> 1)  Integrate Mahout code as part of a wholesale refactoring of the
> linear package
> 2)  Extend use of the visitor pattern to perform mutations
> "in-place" (similar to 0) in effect)
>

Speaking as one of the main authors of the Mahout code and very occasional
contributor to CM, I doubt that integrating it directly will suit CM
needs/prejudices.

For instance, the whole sparse matrix problem where 0 x Inf => 0 instead of
NaN is probably not satisfactory for CM, but speed was considered a more
important requirement for Mahout.  Similarly, Mahout math depends on a
primitive collection implementation that generates over 200 classes from
templates.  That makes some of the sparse codes very fast, but it might
lead to some indigestion for CM.