You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Gilles Sadowski <gi...@harfang.homelinux.org> on 2011/08/15 00:52:11 UTC

[Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Hi.

I'm rather confused by the appearance of sparseness handling at the level
of a "general" vector (i.e. "RealVector", "AbstractRealVector") as well as
at the level of a non-sparse data structure ("ArrayRealVector").

This makes for a lot of convoluted code containing "instanceof" operators...

It was also pointed out by Arne Plöse in
  https://issues.apache.org/jira/browse/MATH-626
that it could lead to inefficient code.

Following the suggestion there, I wonder whether we should perform some
cleanup of the "RealVector" hierarchy, such as moving methods that are
sparseness-related over to the "SparseRealVector" interface, and removing
anything sparseness-related from "AbstractRealVector" and "ArrayRealVector".

However I don't have any idea of the implications of this refactoring. The
documentation is not very explicit about why the sparseness was introduced
in "RealVector" and the Javadoc for "sparseIterator()" adds to the
confusion:
---CUT---
    /**
     * Specialized implementations may choose to not iterate over all
     * dimensions, either because those values are unset, or are equal
     * to defaultValue(), or are small enough to be ignored for the
     * purposes of iteration.
     * No guarantees are made about order of iteration.
     * In dense implementations, this method will often delegate to
     * {@link #iterator()}.
     *
     * @return a sparse iterator
     */
    Iterator<Entry> sparseIterator();
---CUT---

All dimensions (plural)? "defaultValue()"? Order of iteration?

Also, I would expect that an iterator in "RealVector" would by default
iterate over all indices and return the sequence of _values_ (double), not
of "Entry" objects.
I assume that those "Entry" objects are necessary only for implementations
of sparse vectors to avoid returning the many entries set to the default
(non-stored) value...
In fact, don't you think that the "RealVector" interface should extend
"Iterable<Double>" while "SparseRealVector" would extend "Iterable<Entry>"?


Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Arne Ploese <ap...@gmx.de>.

Ted,

please just take your code and move Iterator<Entry> sparseIterator();
from RealVector to SparseRealvector. Look then at the compiler errors,
that are in most cases misuses of sparseIterator().

By the way most(all?) functions in CM return an array (dense)
impementation as return type.   

You mention the method isSparse in your posts - there is currently no
such method, if you want to detect, if you have a sparse implementation
you will have to check for SparseRealvector. So there is no big deal in
a following cast to SparseRealVector to access sparseIterator(). 

As I am no math guru, I cant follow your example, some real code
probaply will do the trick.

Arne

Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning: 
> Here is an example from the perspective of somebody adding a new kind of
> matrix.
> 
> Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p) that
> has elements that are -1, 0 or 1.  1 and -1 have equal probabilities of p/2.
>  The value of p should be in [0,1].
> 
> It would be very nice if the implementor of this matrix could extend an
> abstract matrix and over-ride get() to generate a value and set() to throw
> an unsupported operation exception.  If p < 0.1, then the matrix should be
> marked as sparse, else as dense.
> 
> All operations against other matrices, sparse or dense should work well
> without any special handling by the implementor of this matrix.
> 
> This works in Mahout for instance by having the default operations in
> AbstractMatrix test for sparseness of left or right operands and do the
> right thing.  Obviously, a type test will not tell you whether this matrix
> is sparse or not.
> 
> This matrix and siblings is very important in compressed sensing and
> stochastic projection algorithms.
> 
> On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com> wrote:
> 
> > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > Hi.
> > >
> > >> I understood what he was suggesting.  I still disagree.  Dynamic
> > dispatch
> > >> and non-lattice typing structure is still required to make this all
> > work.
> > >>  Java doesn't really do that.  Pretending that what Java does is
> > sufficient
> > >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > > Maybe that *I* don't understand what you are hinting at. Sorry for being
> > > dense. [Although that seems appropriate in this discussion :-).]
> > >
> > > Polymorphism provides dynamic dispatch, overloading does not; that's why
> > my
> > > proposition is that when you manipulate "unknown" types, those should
> > come
> > > as "this", not as the argument of the method.
> > >
> > > What's wrong with that?
> > >
> > > As for "hammer-looking-for-a-nail", I also don't see what you mean: What
> > is
> > > the problem? I guess that there are lots of applications who never need
> > to
> > > know about sparse vectors/matrices. In those cases, the added complexity
> > is
> > > not a "feature". The issue reported contends that the current design in
> > CM
> > > can cause problems for dense implementations. I'm not even sure that the
> > > current design is usable for the type of applications that make heavy use
> > of
> > > sparseness. Those are problems, IMHO.
> >
> > I have been out of pocket the last couple of days and may not have
> > time to dig into this until late tonight, but I agree with Gilles
> > that we need to get the conversation here more concrete.  I know we
> > discussed this before and Ted and others had good examples
> > justifying the current setup.  Can we revisit these, please?   What
> > would be great would be some examples both from the perspective of
> > the [math] developer looking to add a new or specialized class and
> > [math] users writing code that leverages the setup.
> >
> > Phil
> > >
> > >
> > > Gilles
> > >
> > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > gsterijevski@gmail.com>wrote:
> > >>
> > >>> Forgive me for pushing my nose under the tent... I couldn't resist.
> > >>>
> > >>> I think Gilles is saying that each specialization of the matrix/vector
> > >>> objects would need to support pre (and post) multiplication with a
> > dense.
> > >>> So
> > >>> the type issue would not be problematic.
> > >>>
> > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> No.
> > >>>>
> > >>>> You can't.  This is because the type is lost as you enter the generic
> > >>>> library.
> > >>>>
> > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > >>>> gilles@harfang.homelinux.org> wrote:
> > >>>>
> > >>>>>> They know that their own object is dense, but they don't know what
> > >>> kind
> > >>>>> of
> > >>>>>> input they were given.  They should still run fast if the input is
> > >>>>> sparse.
> > >>>>>
> > >>>>> Couldn't we still rely on polymorphism by implementing "preTimes":
> > >>>>>   unknown.preTimes(dense)
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Arne Ploese <ap...@gmx.de>.

I could think of the following extension to RealVector:

  boolean isSparse();
  void setSparse(boolean sparse);
  double calcSparse(double sparseThresholdInPercent);

Explanation:

Default value from isSparse on ArrayRealvector is false and on
OpenMapRealvector is true (set in constructor).

setSparse allows the developer to set a vector/matrix in a known state
(dense|sparse).

calcSparse iterates internally over all entries(ArrayRealVector) or just
reads the value (OpenMapRealVector). If the fillstate is bigger than
sparseThresholdInPercent the field sparse will be set to false otherwise
to true. The actual fillstate will be returned.  

SparseIterator of ArrayRealvector returns only entries != 0. 

If anyone is interested in a draft implementation I can do the work and
add a patch the JIRA MATH-628 (next week). 

Am Mittwoch, den 17.08.2011, 16:39 -0700 schrieb Ted Dunning: 
> On Wed, Aug 17, 2011 at 4:24 PM, Greg Sterijevski <gs...@gmail.com>wrote:
> 
> > On symmetrics, diagonal, banded and so on, I disagree-as I have made clear
> > in the past. In the case of White standard errors or panel regressions, you
> > typically have long strings of multiplication by diagonals and symmetrics,
> > sandwich products and so forth. There are enough of these types of
> > operations that a math/stat library should support these forms of matrices.
> > isSparse() will not cut it. A symmetric matrix is typically NOT sparse, it
> > is typically dense in these cases. More importantly, the number of
> > operations you save by explicitly recognizing the special structure is not
> > insignificant.
> >
> 
> I agree pretty much for the symmetric case.  The savings for banded matrices
> is surprisingly small and you can have all of those savings as a left
> operand.  It is the right operand where simple sparsity accounts for almost
> everything.
> 
> >
> > > For banded arrays, the economies available beyond simple sparse
> > algorithms
> > > are even more limited.
> > >
> > >
> > I am confused, Ted, since when I suggested that some of the multiplication
> > issues could solved by method overloading, you thought it would not work.
> > Probably a mis-communication on my part.
> >
> 
> No.  What I mean is that when a sparse matrix is the right operand and not
> subject overloading due to dynamic typing, you get most of the advantages of
> the fancy optimizations without any need for anything more than isSparse and
> a sparse iterator.  You don't get all of the benefit, but you do get most of
> it and you get it for a wide variety of left operands.
> 
> 
> > > Symmetric and triangular matrices also have special properties but it is
> > > hard to decide what is really important there.  Many of the special
> > > operations for these kinds of matrix are subject to solving by overloads
> > > instead of indicators since we aren't dealing with binary operations.
> >  For
> > > example, left and right inverse multiplication with triangular matrices
> > is
> > > handled by normal single dispatch and qualifying an argument for real
> > > Cholesky decomposition is specific to the Cholesky decomposition itself.
> > >
> > >
> > While on the discussion of extending RealMatrix in any direction, I would
> > humbly offer that the objects are too complex. Pardon this foolish
> > question,
> > but what are the uses for the methods " double
> > walkInRowOrder(RealMatrixChangingVisitor visitor)" When I think of having
> > to
> > fill in all those methods for any extension, my head spins.
> >
> 
> Exactly.  I think that you shouldn't need that.  The abstract super class
> should have a moderately fancy operation that is limited to handling
> generically sparse and dense matrices reasonably well and dense x dense
> cases really well.  The writer of a new matrix type should not need to know
> about that at all.
> 
> What I was suggesting is that if you have a leftInverseMultiply method that
> solves for x in the system Ax = b, then that method knows the type of A
> because the natural method call is A.leftInverseMultiply(b).  Thus, if A has
> special structure, you get what you need from normal method dispatch.
> 
> It is the cases like A.times(b) or A.plus(b) that will be best supported by
> generic sparsity support in the abstract class.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Aug 17, 2011 at 4:24 PM, Greg Sterijevski <gs...@gmail.com>wrote:

> On symmetrics, diagonal, banded and so on, I disagree-as I have made clear
> in the past. In the case of White standard errors or panel regressions, you
> typically have long strings of multiplication by diagonals and symmetrics,
> sandwich products and so forth. There are enough of these types of
> operations that a math/stat library should support these forms of matrices.
> isSparse() will not cut it. A symmetric matrix is typically NOT sparse, it
> is typically dense in these cases. More importantly, the number of
> operations you save by explicitly recognizing the special structure is not
> insignificant.
>

I agree pretty much for the symmetric case.  The savings for banded matrices
is surprisingly small and you can have all of those savings as a left
operand.  It is the right operand where simple sparsity accounts for almost
everything.

>
> > For banded arrays, the economies available beyond simple sparse
> algorithms
> > are even more limited.
> >
> >
> I am confused, Ted, since when I suggested that some of the multiplication
> issues could solved by method overloading, you thought it would not work.
> Probably a mis-communication on my part.
>

No.  What I mean is that when a sparse matrix is the right operand and not
subject overloading due to dynamic typing, you get most of the advantages of
the fancy optimizations without any need for anything more than isSparse and
a sparse iterator.  You don't get all of the benefit, but you do get most of
it and you get it for a wide variety of left operands.

> > Symmetric and triangular matrices also have special properties but it is
> > hard to decide what is really important there.  Many of the special
> > operations for these kinds of matrix are subject to solving by overloads
> > instead of indicators since we aren't dealing with binary operations.
>  For
> > example, left and right inverse multiplication with triangular matrices
> is
> > handled by normal single dispatch and qualifying an argument for real
> > Cholesky decomposition is specific to the Cholesky decomposition itself.
> >
> >
> While on the discussion of extending RealMatrix in any direction, I would
> humbly offer that the objects are too complex. Pardon this foolish
> question,
> but what are the uses for the methods " double
> walkInRowOrder(RealMatrixChangingVisitor visitor)" When I think of having
> to
> fill in all those methods for any extension, my head spins.
>

Exactly.  I think that you shouldn't need that.  The abstract super class
should have a moderately fancy operation that is limited to handling
generically sparse and dense matrices reasonably well and dense x dense
cases really well.  The writer of a new matrix type should not need to know
about that at all.

What I was suggesting is that if you have a leftInverseMultiply method that
solves for x in the system Ax = b, then that method knows the type of A
because the natural method call is A.leftInverseMultiply(b).  Thus, if A has
special structure, you get what you need from normal method dispatch.

It is the cases like A.times(b) or A.plus(b) that will be best supported by
generic sparsity support in the abstract class.

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Greg Sterijevski <gs...@gmail.com>.

On symmetrics, diagonal, banded and so on, I disagree-as I have made clear
in the past. In the case of White standard errors or panel regressions, you
typically have long strings of multiplication by diagonals and symmetrics,
sandwich products and so forth. There are enough of these types of
operations that a math/stat library should support these forms of matrices.
isSparse() will not cut it. A symmetric matrix is typically NOT sparse, it
is typically dense in these cases. More importantly, the number of
operations you save by explicitly recognizing the special structure is not
insignificant.


>
> For banded arrays, the economies available beyond simple sparse algorithms
> are even more limited.
>
>
I am confused, Ted, since when I suggested that some of the multiplication
issues could solved by method overloading, you thought it would not work.
Probably a mis-communication on my part.


> Symmetric and triangular matrices also have special properties but it is
> hard to decide what is really important there.  Many of the special
> operations for these kinds of matrix are subject to solving by overloads
> instead of indicators since we aren't dealing with binary operations.  For
> example, left and right inverse multiplication with triangular matrices is
> handled by normal single dispatch and qualifying an argument for real
> Cholesky decomposition is specific to the Cholesky decomposition itself.
>
>
While on the discussion of extending RealMatrix in any direction, I would
humbly offer that the objects are too complex. Pardon this foolish question,
but what are the uses for the methods " double
walkInRowOrder(RealMatrixChangingVisitor visitor)" When I think of having to
fill in all those methods for any extension, my head spins.

-Greg

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

I would think neither.

I think it should mean that the sparseIterator will be enough faster than
the dense iterator to make it worth using.  The indication doesn't have to
be be crisp or even always quite correct.  Thus, a densely filled
OpenMapReal might just return true because that usage is out of the
ordinary.

Regarding what the sparseIterator returns, skipping zeroes is a fine idea,
but the contract should really be that all non-zeros will be returned rather
than no zeros will be returned.  There should be flexibility to return zero
values if that is much more convenient, but there should be no case where a
non-zero is missed.

Regarding the additional indicators, these are occasionally useful in
particular cases, but many of those special cases actually work out just as
well if you simply treat them as sparse.  As an example, multiplication by a
diagonal matrix can be special cased, but simply iterating over the
non-zeros (i.e. the diagonal) is just as good.  Matrix power is a
counter-example where the computation can be reduced to element-wise power
for diagonal, but not for sparse.  Even so, the sparse implementation is
surprisingly competitive even there especially for small powers if you have
a special sparse-sparse multiply.  Mahout has an additional indicator
function that tells whether sparseIterator returns items in a sequential
order which can reduce many of these cases to a merge.

For banded arrays, the economies available beyond simple sparse algorithms
are even more limited.

Symmetric and triangular matrices also have special properties but it is
hard to decide what is really important there.  Many of the special
operations for these kinds of matrix are subject to solving by overloads
instead of indicators since we aren't dealing with binary operations.  For
example, left and right inverse multiplication with triangular matrices is
handled by normal single dispatch and qualifying an argument for real
Cholesky decomposition is specific to the Cholesky decomposition itself.

On Wed, Aug 17, 2011 at 11:28 AM, Arne Ploese <ap...@gmx.de> wrote:

> So what exactly is the meaning of isSparse?
>
> * the vector at hand implements some kind of sparse storage? I.E:
> OpenMapRealVector
>
> or
>
> * the vector is actually sparsely filled (then what is sparse anyway
> 10%, 50%, 90%?) in this case a SparserelVector interface is useless for
> the compiler.
>
> My suggestion is: ArrayRealVector implements a sparse iterator which
> returns only values != 0?
>
> Having the backing storage implementation open, one could think of:
>
> * isArray
> * isBanded
> * isSymetrical
> * isDiagonal
>
> as additional markers or put the whole thing in an enum or EnumSet?
>
> Am Mittwoch, den 17.08.2011, 08:51 -0700 schrieb Ted Dunning:
> > I think that all vectors and matrices should be able to answer the
> question
> > about whether they are sparse and they should support sparse iterators,
> > defaulting to the normal iterators in the general case.  So yes to the
> first
> > question.  This allows the application programmer to be much less
> concerned
> > about whether they are using a sparse or dense matrix without losing much
> > performance, if any.
> >
> > The second question I think is much less clear.  It might well be a good
> > idea to keep the interface to help the compiler in cases where the
> > programmer knows which general class of matrix they have.
> >
> > On Wed, Aug 17, 2011 at 8:29 AM, Arne Ploese <ap...@gmx.de> wrote:
> >
> > > Am Mittwoch, den 17.08.2011, 07:25 -0700 schrieb Ted Dunning:
> > > > Arne,
> > > >
> > > > Please read the thread again.  I am providing an example of how I
> think
> > > > things *should* be.
> > > OK.
> > > if I understand you right: isSparse() should be added to RealVector and
> > > the interface SparseRealVector can be dropped?
> > >
> > > >
> > > > The point of doing so is that things are not that way now.  Telling
> me
> > > that
> > > > they are not that way is pretty redundant.
> > > >
> > > > On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <ap...@gmx.de> wrote:
> > > >
> > > > > Currently sparseIterator is only used in RealVector, no matrix
> class
> > > > > will be affected, because there is no sparseIterator.
> > > > > Search you sources for "sparseIterator" ?
> > > > >
> > > > > Arne
> > > > > Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > > > > > Here is an example from the perspective of somebody adding a new
> kind
> > > of
> > > > > > matrix.
> > > > > >
> > > > > > Take the two kinds of matrix as RandomTrinaryMatrix(rows,
> columns, p)
> > > > > that
> > > > > > has elements that are -1, 0 or 1.  1 and -1 have equal
> probabilities
> > > of
> > > > > p/2.
> > > > > >  The value of p should be in [0,1].
> > > > > >
> > > > > > It would be very nice if the implementor of this matrix could
> extend
> > > an
> > > > > > abstract matrix and over-ride get() to generate a value and set()
> to
> > > > > throw
> > > > > > an unsupported operation exception.  If p < 0.1, then the matrix
> > > should
> > > > > be
> > > > > > marked as sparse, else as dense.
> > > > > >
> > > > > > All operations against other matrices, sparse or dense should
> work
> > > well
> > > > > > without any special handling by the implementor of this matrix.
> > > > > >
> > > > > > This works in Mahout for instance by having the default
> operations in
> > > > > > AbstractMatrix test for sparseness of left or right operands and
> do
> > > the
> > > > > > right thing.  Obviously, a type test will not tell you whether
> this
> > > > > matrix
> > > > > > is sparse or not.
> > > > > >
> > > > > > This matrix and siblings is very important in compressed sensing
> and
> > > > > > stochastic projection algorithms.
> > > > > >
> > > > > > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <
> phil.steitz@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > > > > > Hi.
> > > > > > > >
> > > > > > > >> I understood what he was suggesting.  I still disagree.
>  Dynamic
> > > > > > > dispatch
> > > > > > > >> and non-lattice typing structure is still required to make
> this
> > > all
> > > > > > > work.
> > > > > > > >>  Java doesn't really do that.  Pretending that what Java
> does is
> > > > > > > sufficient
> > > > > > > >> is hammer-looking-for-a-nail, not solving the problems at
> hand.
> > > > > > > > Maybe that *I* don't understand what you are hinting at.
> Sorry
> > > for
> > > > > being
> > > > > > > > dense. [Although that seems appropriate in this discussion
> :-).]
> > > > > > > >
> > > > > > > > Polymorphism provides dynamic dispatch, overloading does not;
> > > that's
> > > > > why
> > > > > > > my
> > > > > > > > proposition is that when you manipulate "unknown" types,
> those
> > > should
> > > > > > > come
> > > > > > > > as "this", not as the argument of the method.
> > > > > > > >
> > > > > > > > What's wrong with that?
> > > > > > > >
> > > > > > > > As for "hammer-looking-for-a-nail", I also don't see what you
> > > mean:
> > > > > What
> > > > > > > is
> > > > > > > > the problem? I guess that there are lots of applications who
> > > never
> > > > > need
> > > > > > > to
> > > > > > > > know about sparse vectors/matrices. In those cases, the added
> > > > > complexity
> > > > > > > is
> > > > > > > > not a "feature". The issue reported contends that the current
> > > design
> > > > > in
> > > > > > > CM
> > > > > > > > can cause problems for dense implementations. I'm not even
> sure
> > > that
> > > > > the
> > > > > > > > current design is usable for the type of applications that
> make
> > > heavy
> > > > > use
> > > > > > > of
> > > > > > > > sparseness. Those are problems, IMHO.
> > > > > > >
> > > > > > > I have been out of pocket the last couple of days and may not
> have
> > > > > > > time to dig into this until late tonight, but I agree with
> Gilles
> > > > > > > that we need to get the conversation here more concrete.  I
> know we
> > > > > > > discussed this before and Ted and others had good examples
> > > > > > > justifying the current setup.  Can we revisit these, please?
> What
> > > > > > > would be great would be some examples both from the perspective
> of
> > > > > > > the [math] developer looking to add a new or specialized class
> and
> > > > > > > [math] users writing code that leverages the setup.
> > > > > > >
> > > > > > > Phil
> > > > > > > >
> > > > > > > >
> > > > > > > > Gilles
> > > > > > > >
> > > > > > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > > > > > gsterijevski@gmail.com>wrote:
> > > > > > > >>
> > > > > > > >>> Forgive me for pushing my nose under the tent... I couldn't
> > > resist.
> > > > > > > >>>
> > > > > > > >>> I think Gilles is saying that each specialization of the
> > > > > matrix/vector
> > > > > > > >>> objects would need to support pre (and post) multiplication
> > > with a
> > > > > > > dense.
> > > > > > > >>> So
> > > > > > > >>> the type issue would not be problematic.
> > > > > > > >>>
> > > > > > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> > > > > ted.dunning@gmail.com>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> No.
> > > > > > > >>>>
> > > > > > > >>>> You can't.  This is because the type is lost as you enter
> the
> > > > > generic
> > > > > > > >>>> library.
> > > > > > > >>>>
> > > > > > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > > > > > >>>> gilles@harfang.homelinux.org> wrote:
> > > > > > > >>>>
> > > > > > > >>>>>> They know that their own object is dense, but they don't
> > > know
> > > > > what
> > > > > > > >>> kind
> > > > > > > >>>>> of
> > > > > > > >>>>>> input they were given.  They should still run fast if
> the
> > > input
> > > > > is
> > > > > > > >>>>> sparse.
> > > > > > > >>>>>
> > > > > > > >>>>> Couldn't we still rely on polymorphism by implementing
> > > > > "preTimes":
> > > > > > > >>>>>   unknown.preTimes(dense)
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Matthew Pocock <tu...@gmail.com>.

Also, for matrices, is the internal storage optimized for row-major access
or column-major access? Perhaps you can get a sparse iterator over the major
dimension which returns a dense iterator over the minor, or a sparse iteator
over all cells. What you expose boils down to what it is about the matrix or
vector that you need to discover to be able to implement binary ops
efficiently enough. As Java doesn't have multiple-dispatch, I don't really
know a completely clean way to deal with this.

I also agree with Ted. The contract for sparse iteration should be that it
can optionally skip returning zero entries. There may be cases when it's
cheaper to store some zeros even if you use some sort of skip-list structure
internally.

Matthew

On 17 August 2011 19:28, Arne Ploese <ap...@gmx.de> wrote:

> So what exactly is the meaning of isSparse?
>
> * the vector at hand implements some kind of sparse storage? I.E:
> OpenMapRealVector
>
> or
>
> * the vector is actually sparsely filled (then what is sparse anyway
> 10%, 50%, 90%?) in this case a SparserelVector interface is useless for
> the compiler.
>
> My suggestion is: ArrayRealVector implements a sparse iterator which
> returns only values != 0?
>
> Having the backing storage implementation open, one could think of:
>
> * isArray
> * isBanded
> * isSymetrical
> * isDiagonal
>
> as additional markers or put the whole thing in an enum or EnumSet?
>
> Am Mittwoch, den 17.08.2011, 08:51 -0700 schrieb Ted Dunning:
> > I think that all vectors and matrices should be able to answer the
> question
> > about whether they are sparse and they should support sparse iterators,
> > defaulting to the normal iterators in the general case.  So yes to the
> first
> > question.  This allows the application programmer to be much less
> concerned
> > about whether they are using a sparse or dense matrix without losing much
> > performance, if any.
> >
> > The second question I think is much less clear.  It might well be a good
> > idea to keep the interface to help the compiler in cases where the
> > programmer knows which general class of matrix they have.
> >
> > On Wed, Aug 17, 2011 at 8:29 AM, Arne Ploese <ap...@gmx.de> wrote:
> >
> > > Am Mittwoch, den 17.08.2011, 07:25 -0700 schrieb Ted Dunning:
> > > > Arne,
> > > >
> > > > Please read the thread again.  I am providing an example of how I
> think
> > > > things *should* be.
> > > OK.
> > > if I understand you right: isSparse() should be added to RealVector and
> > > the interface SparseRealVector can be dropped?
> > >
> > > >
> > > > The point of doing so is that things are not that way now.  Telling
> me
> > > that
> > > > they are not that way is pretty redundant.
> > > >
> > > > On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <ap...@gmx.de> wrote:
> > > >
> > > > > Currently sparseIterator is only used in RealVector, no matrix
> class
> > > > > will be affected, because there is no sparseIterator.
> > > > > Search you sources for "sparseIterator" ?
> > > > >
> > > > > Arne
> > > > > Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > > > > > Here is an example from the perspective of somebody adding a new
> kind
> > > of
> > > > > > matrix.
> > > > > >
> > > > > > Take the two kinds of matrix as RandomTrinaryMatrix(rows,
> columns, p)
> > > > > that
> > > > > > has elements that are -1, 0 or 1.  1 and -1 have equal
> probabilities
> > > of
> > > > > p/2.
> > > > > >  The value of p should be in [0,1].
> > > > > >
> > > > > > It would be very nice if the implementor of this matrix could
> extend
> > > an
> > > > > > abstract matrix and over-ride get() to generate a value and set()
> to
> > > > > throw
> > > > > > an unsupported operation exception.  If p < 0.1, then the matrix
> > > should
> > > > > be
> > > > > > marked as sparse, else as dense.
> > > > > >
> > > > > > All operations against other matrices, sparse or dense should
> work
> > > well
> > > > > > without any special handling by the implementor of this matrix.
> > > > > >
> > > > > > This works in Mahout for instance by having the default
> operations in
> > > > > > AbstractMatrix test for sparseness of left or right operands and
> do
> > > the
> > > > > > right thing.  Obviously, a type test will not tell you whether
> this
> > > > > matrix
> > > > > > is sparse or not.
> > > > > >
> > > > > > This matrix and siblings is very important in compressed sensing
> and
> > > > > > stochastic projection algorithms.
> > > > > >
> > > > > > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <
> phil.steitz@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > > > > > Hi.
> > > > > > > >
> > > > > > > >> I understood what he was suggesting.  I still disagree.
>  Dynamic
> > > > > > > dispatch
> > > > > > > >> and non-lattice typing structure is still required to make
> this
> > > all
> > > > > > > work.
> > > > > > > >>  Java doesn't really do that.  Pretending that what Java
> does is
> > > > > > > sufficient
> > > > > > > >> is hammer-looking-for-a-nail, not solving the problems at
> hand.
> > > > > > > > Maybe that *I* don't understand what you are hinting at.
> Sorry
> > > for
> > > > > being
> > > > > > > > dense. [Although that seems appropriate in this discussion
> :-).]
> > > > > > > >
> > > > > > > > Polymorphism provides dynamic dispatch, overloading does not;
> > > that's
> > > > > why
> > > > > > > my
> > > > > > > > proposition is that when you manipulate "unknown" types,
> those
> > > should
> > > > > > > come
> > > > > > > > as "this", not as the argument of the method.
> > > > > > > >
> > > > > > > > What's wrong with that?
> > > > > > > >
> > > > > > > > As for "hammer-looking-for-a-nail", I also don't see what you
> > > mean:
> > > > > What
> > > > > > > is
> > > > > > > > the problem? I guess that there are lots of applications who
> > > never
> > > > > need
> > > > > > > to
> > > > > > > > know about sparse vectors/matrices. In those cases, the added
> > > > > complexity
> > > > > > > is
> > > > > > > > not a "feature". The issue reported contends that the current
> > > design
> > > > > in
> > > > > > > CM
> > > > > > > > can cause problems for dense implementations. I'm not even
> sure
> > > that
> > > > > the
> > > > > > > > current design is usable for the type of applications that
> make
> > > heavy
> > > > > use
> > > > > > > of
> > > > > > > > sparseness. Those are problems, IMHO.
> > > > > > >
> > > > > > > I have been out of pocket the last couple of days and may not
> have
> > > > > > > time to dig into this until late tonight, but I agree with
> Gilles
> > > > > > > that we need to get the conversation here more concrete.  I
> know we
> > > > > > > discussed this before and Ted and others had good examples
> > > > > > > justifying the current setup.  Can we revisit these, please?
> What
> > > > > > > would be great would be some examples both from the perspective
> of
> > > > > > > the [math] developer looking to add a new or specialized class
> and
> > > > > > > [math] users writing code that leverages the setup.
> > > > > > >
> > > > > > > Phil
> > > > > > > >
> > > > > > > >
> > > > > > > > Gilles
> > > > > > > >
> > > > > > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > > > > > gsterijevski@gmail.com>wrote:
> > > > > > > >>
> > > > > > > >>> Forgive me for pushing my nose under the tent... I couldn't
> > > resist.
> > > > > > > >>>
> > > > > > > >>> I think Gilles is saying that each specialization of the
> > > > > matrix/vector
> > > > > > > >>> objects would need to support pre (and post) multiplication
> > > with a
> > > > > > > dense.
> > > > > > > >>> So
> > > > > > > >>> the type issue would not be problematic.
> > > > > > > >>>
> > > > > > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> > > > > ted.dunning@gmail.com>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> No.
> > > > > > > >>>>
> > > > > > > >>>> You can't.  This is because the type is lost as you enter
> the
> > > > > generic
> > > > > > > >>>> library.
> > > > > > > >>>>
> > > > > > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > > > > > >>>> gilles@harfang.homelinux.org> wrote:
> > > > > > > >>>>
> > > > > > > >>>>>> They know that their own object is dense, but they don't
> > > know
> > > > > what
> > > > > > > >>> kind
> > > > > > > >>>>> of
> > > > > > > >>>>>> input they were given.  They should still run fast if
> the
> > > input
> > > > > is
> > > > > > > >>>>> sparse.
> > > > > > > >>>>>
> > > > > > > >>>>> Couldn't we still rely on polymorphism by implementing
> > > > > "preTimes":
> > > > > > > >>>>>   unknown.preTimes(dense)
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
Dr Matthew Pocock
Visitor, School of Computing Science, Newcastle University
mailto: turingatemyhamster@gmail.com
gchat: turingatemyhamster@gmail.com
msn: matthew_pocock@yahoo.co.uk
irc.freenode.net: drdozer
tel: (0191) 2566550
mob: +447535664143

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Arne Ploese <ap...@gmx.de>.

So what exactly is the meaning of isSparse?

* the vector at hand implements some kind of sparse storage? I.E:
OpenMapRealVector

or

* the vector is actually sparsely filled (then what is sparse anyway
10%, 50%, 90%?) in this case a SparserelVector interface is useless for
the compiler.

My suggestion is: ArrayRealVector implements a sparse iterator which
returns only values != 0? 

Having the backing storage implementation open, one could think of:

* isArray
* isBanded
* isSymetrical
* isDiagonal

as additional markers or put the whole thing in an enum or EnumSet?  

Am Mittwoch, den 17.08.2011, 08:51 -0700 schrieb Ted Dunning: 
> I think that all vectors and matrices should be able to answer the question
> about whether they are sparse and they should support sparse iterators,
> defaulting to the normal iterators in the general case.  So yes to the first
> question.  This allows the application programmer to be much less concerned
> about whether they are using a sparse or dense matrix without losing much
> performance, if any.
> 
> The second question I think is much less clear.  It might well be a good
> idea to keep the interface to help the compiler in cases where the
> programmer knows which general class of matrix they have.
> 
> On Wed, Aug 17, 2011 at 8:29 AM, Arne Ploese <ap...@gmx.de> wrote:
> 
> > Am Mittwoch, den 17.08.2011, 07:25 -0700 schrieb Ted Dunning:
> > > Arne,
> > >
> > > Please read the thread again.  I am providing an example of how I think
> > > things *should* be.
> > OK.
> > if I understand you right: isSparse() should be added to RealVector and
> > the interface SparseRealVector can be dropped?
> >
> > >
> > > The point of doing so is that things are not that way now.  Telling me
> > that
> > > they are not that way is pretty redundant.
> > >
> > > On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <ap...@gmx.de> wrote:
> > >
> > > > Currently sparseIterator is only used in RealVector, no matrix class
> > > > will be affected, because there is no sparseIterator.
> > > > Search you sources for "sparseIterator" ?
> > > >
> > > > Arne
> > > > Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > > > > Here is an example from the perspective of somebody adding a new kind
> > of
> > > > > matrix.
> > > > >
> > > > > Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p)
> > > > that
> > > > > has elements that are -1, 0 or 1.  1 and -1 have equal probabilities
> > of
> > > > p/2.
> > > > >  The value of p should be in [0,1].
> > > > >
> > > > > It would be very nice if the implementor of this matrix could extend
> > an
> > > > > abstract matrix and over-ride get() to generate a value and set() to
> > > > throw
> > > > > an unsupported operation exception.  If p < 0.1, then the matrix
> > should
> > > > be
> > > > > marked as sparse, else as dense.
> > > > >
> > > > > All operations against other matrices, sparse or dense should work
> > well
> > > > > without any special handling by the implementor of this matrix.
> > > > >
> > > > > This works in Mahout for instance by having the default operations in
> > > > > AbstractMatrix test for sparseness of left or right operands and do
> > the
> > > > > right thing.  Obviously, a type test will not tell you whether this
> > > > matrix
> > > > > is sparse or not.
> > > > >
> > > > > This matrix and siblings is very important in compressed sensing and
> > > > > stochastic projection algorithms.
> > > > >
> > > > > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > > > > Hi.
> > > > > > >
> > > > > > >> I understood what he was suggesting.  I still disagree.  Dynamic
> > > > > > dispatch
> > > > > > >> and non-lattice typing structure is still required to make this
> > all
> > > > > > work.
> > > > > > >>  Java doesn't really do that.  Pretending that what Java does is
> > > > > > sufficient
> > > > > > >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > > > > > > Maybe that *I* don't understand what you are hinting at. Sorry
> > for
> > > > being
> > > > > > > dense. [Although that seems appropriate in this discussion :-).]
> > > > > > >
> > > > > > > Polymorphism provides dynamic dispatch, overloading does not;
> > that's
> > > > why
> > > > > > my
> > > > > > > proposition is that when you manipulate "unknown" types, those
> > should
> > > > > > come
> > > > > > > as "this", not as the argument of the method.
> > > > > > >
> > > > > > > What's wrong with that?
> > > > > > >
> > > > > > > As for "hammer-looking-for-a-nail", I also don't see what you
> > mean:
> > > > What
> > > > > > is
> > > > > > > the problem? I guess that there are lots of applications who
> > never
> > > > need
> > > > > > to
> > > > > > > know about sparse vectors/matrices. In those cases, the added
> > > > complexity
> > > > > > is
> > > > > > > not a "feature". The issue reported contends that the current
> > design
> > > > in
> > > > > > CM
> > > > > > > can cause problems for dense implementations. I'm not even sure
> > that
> > > > the
> > > > > > > current design is usable for the type of applications that make
> > heavy
> > > > use
> > > > > > of
> > > > > > > sparseness. Those are problems, IMHO.
> > > > > >
> > > > > > I have been out of pocket the last couple of days and may not have
> > > > > > time to dig into this until late tonight, but I agree with Gilles
> > > > > > that we need to get the conversation here more concrete.  I know we
> > > > > > discussed this before and Ted and others had good examples
> > > > > > justifying the current setup.  Can we revisit these, please?   What
> > > > > > would be great would be some examples both from the perspective of
> > > > > > the [math] developer looking to add a new or specialized class and
> > > > > > [math] users writing code that leverages the setup.
> > > > > >
> > > > > > Phil
> > > > > > >
> > > > > > >
> > > > > > > Gilles
> > > > > > >
> > > > > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > > > > gsterijevski@gmail.com>wrote:
> > > > > > >>
> > > > > > >>> Forgive me for pushing my nose under the tent... I couldn't
> > resist.
> > > > > > >>>
> > > > > > >>> I think Gilles is saying that each specialization of the
> > > > matrix/vector
> > > > > > >>> objects would need to support pre (and post) multiplication
> > with a
> > > > > > dense.
> > > > > > >>> So
> > > > > > >>> the type issue would not be problematic.
> > > > > > >>>
> > > > > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> > > > ted.dunning@gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> No.
> > > > > > >>>>
> > > > > > >>>> You can't.  This is because the type is lost as you enter the
> > > > generic
> > > > > > >>>> library.
> > > > > > >>>>
> > > > > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > > > > >>>> gilles@harfang.homelinux.org> wrote:
> > > > > > >>>>
> > > > > > >>>>>> They know that their own object is dense, but they don't
> > know
> > > > what
> > > > > > >>> kind
> > > > > > >>>>> of
> > > > > > >>>>>> input they were given.  They should still run fast if the
> > input
> > > > is
> > > > > > >>>>> sparse.
> > > > > > >>>>>
> > > > > > >>>>> Couldn't we still rely on polymorphism by implementing
> > > > "preTimes":
> > > > > > >>>>>   unknown.preTimes(dense)
> > > > > > >
> > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

I think that all vectors and matrices should be able to answer the question
about whether they are sparse and they should support sparse iterators,
defaulting to the normal iterators in the general case.  So yes to the first
question.  This allows the application programmer to be much less concerned
about whether they are using a sparse or dense matrix without losing much
performance, if any.

The second question I think is much less clear.  It might well be a good
idea to keep the interface to help the compiler in cases where the
programmer knows which general class of matrix they have.

On Wed, Aug 17, 2011 at 8:29 AM, Arne Ploese <ap...@gmx.de> wrote:

> Am Mittwoch, den 17.08.2011, 07:25 -0700 schrieb Ted Dunning:
> > Arne,
> >
> > Please read the thread again.  I am providing an example of how I think
> > things *should* be.
> OK.
> if I understand you right: isSparse() should be added to RealVector and
> the interface SparseRealVector can be dropped?
>
> >
> > The point of doing so is that things are not that way now.  Telling me
> that
> > they are not that way is pretty redundant.
> >
> > On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <ap...@gmx.de> wrote:
> >
> > > Currently sparseIterator is only used in RealVector, no matrix class
> > > will be affected, because there is no sparseIterator.
> > > Search you sources for "sparseIterator" ?
> > >
> > > Arne
> > > Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > > > Here is an example from the perspective of somebody adding a new kind
> of
> > > > matrix.
> > > >
> > > > Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p)
> > > that
> > > > has elements that are -1, 0 or 1.  1 and -1 have equal probabilities
> of
> > > p/2.
> > > >  The value of p should be in [0,1].
> > > >
> > > > It would be very nice if the implementor of this matrix could extend
> an
> > > > abstract matrix and over-ride get() to generate a value and set() to
> > > throw
> > > > an unsupported operation exception.  If p < 0.1, then the matrix
> should
> > > be
> > > > marked as sparse, else as dense.
> > > >
> > > > All operations against other matrices, sparse or dense should work
> well
> > > > without any special handling by the implementor of this matrix.
> > > >
> > > > This works in Mahout for instance by having the default operations in
> > > > AbstractMatrix test for sparseness of left or right operands and do
> the
> > > > right thing.  Obviously, a type test will not tell you whether this
> > > matrix
> > > > is sparse or not.
> > > >
> > > > This matrix and siblings is very important in compressed sensing and
> > > > stochastic projection algorithms.
> > > >
> > > > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com>
> > > wrote:
> > > >
> > > > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > > > Hi.
> > > > > >
> > > > > >> I understood what he was suggesting.  I still disagree.  Dynamic
> > > > > dispatch
> > > > > >> and non-lattice typing structure is still required to make this
> all
> > > > > work.
> > > > > >>  Java doesn't really do that.  Pretending that what Java does is
> > > > > sufficient
> > > > > >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > > > > > Maybe that *I* don't understand what you are hinting at. Sorry
> for
> > > being
> > > > > > dense. [Although that seems appropriate in this discussion :-).]
> > > > > >
> > > > > > Polymorphism provides dynamic dispatch, overloading does not;
> that's
> > > why
> > > > > my
> > > > > > proposition is that when you manipulate "unknown" types, those
> should
> > > > > come
> > > > > > as "this", not as the argument of the method.
> > > > > >
> > > > > > What's wrong with that?
> > > > > >
> > > > > > As for "hammer-looking-for-a-nail", I also don't see what you
> mean:
> > > What
> > > > > is
> > > > > > the problem? I guess that there are lots of applications who
> never
> > > need
> > > > > to
> > > > > > know about sparse vectors/matrices. In those cases, the added
> > > complexity
> > > > > is
> > > > > > not a "feature". The issue reported contends that the current
> design
> > > in
> > > > > CM
> > > > > > can cause problems for dense implementations. I'm not even sure
> that
> > > the
> > > > > > current design is usable for the type of applications that make
> heavy
> > > use
> > > > > of
> > > > > > sparseness. Those are problems, IMHO.
> > > > >
> > > > > I have been out of pocket the last couple of days and may not have
> > > > > time to dig into this until late tonight, but I agree with Gilles
> > > > > that we need to get the conversation here more concrete.  I know we
> > > > > discussed this before and Ted and others had good examples
> > > > > justifying the current setup.  Can we revisit these, please?   What
> > > > > would be great would be some examples both from the perspective of
> > > > > the [math] developer looking to add a new or specialized class and
> > > > > [math] users writing code that leverages the setup.
> > > > >
> > > > > Phil
> > > > > >
> > > > > >
> > > > > > Gilles
> > > > > >
> > > > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > > > gsterijevski@gmail.com>wrote:
> > > > > >>
> > > > > >>> Forgive me for pushing my nose under the tent... I couldn't
> resist.
> > > > > >>>
> > > > > >>> I think Gilles is saying that each specialization of the
> > > matrix/vector
> > > > > >>> objects would need to support pre (and post) multiplication
> with a
> > > > > dense.
> > > > > >>> So
> > > > > >>> the type issue would not be problematic.
> > > > > >>>
> > > > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> > > ted.dunning@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> No.
> > > > > >>>>
> > > > > >>>> You can't.  This is because the type is lost as you enter the
> > > generic
> > > > > >>>> library.
> > > > > >>>>
> > > > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > > > >>>> gilles@harfang.homelinux.org> wrote:
> > > > > >>>>
> > > > > >>>>>> They know that their own object is dense, but they don't
> know
> > > what
> > > > > >>> kind
> > > > > >>>>> of
> > > > > >>>>>> input they were given.  They should still run fast if the
> input
> > > is
> > > > > >>>>> sparse.
> > > > > >>>>>
> > > > > >>>>> Couldn't we still rely on polymorphism by implementing
> > > "preTimes":
> > > > > >>>>>   unknown.preTimes(dense)
> > > > > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Arne Ploese <ap...@gmx.de>.

Am Mittwoch, den 17.08.2011, 07:25 -0700 schrieb Ted Dunning: 
> Arne,
> 
> Please read the thread again.  I am providing an example of how I think
> things *should* be.
OK.
if I understand you right: isSparse() should be added to RealVector and
the interface SparseRealVector can be dropped?

> 
> The point of doing so is that things are not that way now.  Telling me that
> they are not that way is pretty redundant.
> 
> On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <ap...@gmx.de> wrote:
> 
> > Currently sparseIterator is only used in RealVector, no matrix class
> > will be affected, because there is no sparseIterator.
> > Search you sources for "sparseIterator" ?
> >
> > Arne
> > Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > > Here is an example from the perspective of somebody adding a new kind of
> > > matrix.
> > >
> > > Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p)
> > that
> > > has elements that are -1, 0 or 1.  1 and -1 have equal probabilities of
> > p/2.
> > >  The value of p should be in [0,1].
> > >
> > > It would be very nice if the implementor of this matrix could extend an
> > > abstract matrix and over-ride get() to generate a value and set() to
> > throw
> > > an unsupported operation exception.  If p < 0.1, then the matrix should
> > be
> > > marked as sparse, else as dense.
> > >
> > > All operations against other matrices, sparse or dense should work well
> > > without any special handling by the implementor of this matrix.
> > >
> > > This works in Mahout for instance by having the default operations in
> > > AbstractMatrix test for sparseness of left or right operands and do the
> > > right thing.  Obviously, a type test will not tell you whether this
> > matrix
> > > is sparse or not.
> > >
> > > This matrix and siblings is very important in compressed sensing and
> > > stochastic projection algorithms.
> > >
> > > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com>
> > wrote:
> > >
> > > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > > Hi.
> > > > >
> > > > >> I understood what he was suggesting.  I still disagree.  Dynamic
> > > > dispatch
> > > > >> and non-lattice typing structure is still required to make this all
> > > > work.
> > > > >>  Java doesn't really do that.  Pretending that what Java does is
> > > > sufficient
> > > > >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > > > > Maybe that *I* don't understand what you are hinting at. Sorry for
> > being
> > > > > dense. [Although that seems appropriate in this discussion :-).]
> > > > >
> > > > > Polymorphism provides dynamic dispatch, overloading does not; that's
> > why
> > > > my
> > > > > proposition is that when you manipulate "unknown" types, those should
> > > > come
> > > > > as "this", not as the argument of the method.
> > > > >
> > > > > What's wrong with that?
> > > > >
> > > > > As for "hammer-looking-for-a-nail", I also don't see what you mean:
> > What
> > > > is
> > > > > the problem? I guess that there are lots of applications who never
> > need
> > > > to
> > > > > know about sparse vectors/matrices. In those cases, the added
> > complexity
> > > > is
> > > > > not a "feature". The issue reported contends that the current design
> > in
> > > > CM
> > > > > can cause problems for dense implementations. I'm not even sure that
> > the
> > > > > current design is usable for the type of applications that make heavy
> > use
> > > > of
> > > > > sparseness. Those are problems, IMHO.
> > > >
> > > > I have been out of pocket the last couple of days and may not have
> > > > time to dig into this until late tonight, but I agree with Gilles
> > > > that we need to get the conversation here more concrete.  I know we
> > > > discussed this before and Ted and others had good examples
> > > > justifying the current setup.  Can we revisit these, please?   What
> > > > would be great would be some examples both from the perspective of
> > > > the [math] developer looking to add a new or specialized class and
> > > > [math] users writing code that leverages the setup.
> > > >
> > > > Phil
> > > > >
> > > > >
> > > > > Gilles
> > > > >
> > > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > > gsterijevski@gmail.com>wrote:
> > > > >>
> > > > >>> Forgive me for pushing my nose under the tent... I couldn't resist.
> > > > >>>
> > > > >>> I think Gilles is saying that each specialization of the
> > matrix/vector
> > > > >>> objects would need to support pre (and post) multiplication with a
> > > > dense.
> > > > >>> So
> > > > >>> the type issue would not be problematic.
> > > > >>>
> > > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> No.
> > > > >>>>
> > > > >>>> You can't.  This is because the type is lost as you enter the
> > generic
> > > > >>>> library.
> > > > >>>>
> > > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > > >>>> gilles@harfang.homelinux.org> wrote:
> > > > >>>>
> > > > >>>>>> They know that their own object is dense, but they don't know
> > what
> > > > >>> kind
> > > > >>>>> of
> > > > >>>>>> input they were given.  They should still run fast if the input
> > is
> > > > >>>>> sparse.
> > > > >>>>>
> > > > >>>>> Couldn't we still rely on polymorphism by implementing
> > "preTimes":
> > > > >>>>>   unknown.preTimes(dense)
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

Arne,

Please read the thread again.  I am providing an example of how I think
things *should* be.

The point of doing so is that things are not that way now.  Telling me that
they are not that way is pretty redundant.

On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <ap...@gmx.de> wrote:

> Currently sparseIterator is only used in RealVector, no matrix class
> will be affected, because there is no sparseIterator.
> Search you sources for "sparseIterator" ?
>
> Arne
> Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > Here is an example from the perspective of somebody adding a new kind of
> > matrix.
> >
> > Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p)
> that
> > has elements that are -1, 0 or 1.  1 and -1 have equal probabilities of
> p/2.
> >  The value of p should be in [0,1].
> >
> > It would be very nice if the implementor of this matrix could extend an
> > abstract matrix and over-ride get() to generate a value and set() to
> throw
> > an unsupported operation exception.  If p < 0.1, then the matrix should
> be
> > marked as sparse, else as dense.
> >
> > All operations against other matrices, sparse or dense should work well
> > without any special handling by the implementor of this matrix.
> >
> > This works in Mahout for instance by having the default operations in
> > AbstractMatrix test for sparseness of left or right operands and do the
> > right thing.  Obviously, a type test will not tell you whether this
> matrix
> > is sparse or not.
> >
> > This matrix and siblings is very important in compressed sensing and
> > stochastic projection algorithms.
> >
> > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com>
> wrote:
> >
> > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > Hi.
> > > >
> > > >> I understood what he was suggesting.  I still disagree.  Dynamic
> > > dispatch
> > > >> and non-lattice typing structure is still required to make this all
> > > work.
> > > >>  Java doesn't really do that.  Pretending that what Java does is
> > > sufficient
> > > >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > > > Maybe that *I* don't understand what you are hinting at. Sorry for
> being
> > > > dense. [Although that seems appropriate in this discussion :-).]
> > > >
> > > > Polymorphism provides dynamic dispatch, overloading does not; that's
> why
> > > my
> > > > proposition is that when you manipulate "unknown" types, those should
> > > come
> > > > as "this", not as the argument of the method.
> > > >
> > > > What's wrong with that?
> > > >
> > > > As for "hammer-looking-for-a-nail", I also don't see what you mean:
> What
> > > is
> > > > the problem? I guess that there are lots of applications who never
> need
> > > to
> > > > know about sparse vectors/matrices. In those cases, the added
> complexity
> > > is
> > > > not a "feature". The issue reported contends that the current design
> in
> > > CM
> > > > can cause problems for dense implementations. I'm not even sure that
> the
> > > > current design is usable for the type of applications that make heavy
> use
> > > of
> > > > sparseness. Those are problems, IMHO.
> > >
> > > I have been out of pocket the last couple of days and may not have
> > > time to dig into this until late tonight, but I agree with Gilles
> > > that we need to get the conversation here more concrete.  I know we
> > > discussed this before and Ted and others had good examples
> > > justifying the current setup.  Can we revisit these, please?   What
> > > would be great would be some examples both from the perspective of
> > > the [math] developer looking to add a new or specialized class and
> > > [math] users writing code that leverages the setup.
> > >
> > > Phil
> > > >
> > > >
> > > > Gilles
> > > >
> > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > gsterijevski@gmail.com>wrote:
> > > >>
> > > >>> Forgive me for pushing my nose under the tent... I couldn't resist.
> > > >>>
> > > >>> I think Gilles is saying that each specialization of the
> matrix/vector
> > > >>> objects would need to support pre (and post) multiplication with a
> > > dense.
> > > >>> So
> > > >>> the type issue would not be problematic.
> > > >>>
> > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> ted.dunning@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> No.
> > > >>>>
> > > >>>> You can't.  This is because the type is lost as you enter the
> generic
> > > >>>> library.
> > > >>>>
> > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > >>>> gilles@harfang.homelinux.org> wrote:
> > > >>>>
> > > >>>>>> They know that their own object is dense, but they don't know
> what
> > > >>> kind
> > > >>>>> of
> > > >>>>>> input they were given.  They should still run fast if the input
> is
> > > >>>>> sparse.
> > > >>>>>
> > > >>>>> Couldn't we still rely on polymorphism by implementing
> "preTimes":
> > > >>>>>   unknown.preTimes(dense)
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > > For additional commands, e-mail: dev-help@commons.apache.org
> > > >
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Arne Ploese <ap...@gmx.de>.

Currently sparseIterator is only used in RealVector, no matrix class
will be affected, because there is no sparseIterator.
Search you sources for "sparseIterator" ?

Arne
Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning: 
> Here is an example from the perspective of somebody adding a new kind of
> matrix.
> 
> Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p) that
> has elements that are -1, 0 or 1.  1 and -1 have equal probabilities of p/2.
>  The value of p should be in [0,1].
> 
> It would be very nice if the implementor of this matrix could extend an
> abstract matrix and over-ride get() to generate a value and set() to throw
> an unsupported operation exception.  If p < 0.1, then the matrix should be
> marked as sparse, else as dense.
> 
> All operations against other matrices, sparse or dense should work well
> without any special handling by the implementor of this matrix.
> 
> This works in Mahout for instance by having the default operations in
> AbstractMatrix test for sparseness of left or right operands and do the
> right thing.  Obviously, a type test will not tell you whether this matrix
> is sparse or not.
> 
> This matrix and siblings is very important in compressed sensing and
> stochastic projection algorithms.
> 
> On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com> wrote:
> 
> > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > Hi.
> > >
> > >> I understood what he was suggesting.  I still disagree.  Dynamic
> > dispatch
> > >> and non-lattice typing structure is still required to make this all
> > work.
> > >>  Java doesn't really do that.  Pretending that what Java does is
> > sufficient
> > >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > > Maybe that *I* don't understand what you are hinting at. Sorry for being
> > > dense. [Although that seems appropriate in this discussion :-).]
> > >
> > > Polymorphism provides dynamic dispatch, overloading does not; that's why
> > my
> > > proposition is that when you manipulate "unknown" types, those should
> > come
> > > as "this", not as the argument of the method.
> > >
> > > What's wrong with that?
> > >
> > > As for "hammer-looking-for-a-nail", I also don't see what you mean: What
> > is
> > > the problem? I guess that there are lots of applications who never need
> > to
> > > know about sparse vectors/matrices. In those cases, the added complexity
> > is
> > > not a "feature". The issue reported contends that the current design in
> > CM
> > > can cause problems for dense implementations. I'm not even sure that the
> > > current design is usable for the type of applications that make heavy use
> > of
> > > sparseness. Those are problems, IMHO.
> >
> > I have been out of pocket the last couple of days and may not have
> > time to dig into this until late tonight, but I agree with Gilles
> > that we need to get the conversation here more concrete.  I know we
> > discussed this before and Ted and others had good examples
> > justifying the current setup.  Can we revisit these, please?   What
> > would be great would be some examples both from the perspective of
> > the [math] developer looking to add a new or specialized class and
> > [math] users writing code that leverages the setup.
> >
> > Phil
> > >
> > >
> > > Gilles
> > >
> > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > gsterijevski@gmail.com>wrote:
> > >>
> > >>> Forgive me for pushing my nose under the tent... I couldn't resist.
> > >>>
> > >>> I think Gilles is saying that each specialization of the matrix/vector
> > >>> objects would need to support pre (and post) multiplication with a
> > dense.
> > >>> So
> > >>> the type issue would not be problematic.
> > >>>
> > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> No.
> > >>>>
> > >>>> You can't.  This is because the type is lost as you enter the generic
> > >>>> library.
> > >>>>
> > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > >>>> gilles@harfang.homelinux.org> wrote:
> > >>>>
> > >>>>>> They know that their own object is dense, but they don't know what
> > >>> kind
> > >>>>> of
> > >>>>>> input they were given.  They should still run fast if the input is
> > >>>>> sparse.
> > >>>>>
> > >>>>> Couldn't we still rely on polymorphism by implementing "preTimes":
> > >>>>>   unknown.preTimes(dense)
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Aug 17, 2011 at 4:44 AM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> > It would be very nice if the implementor of this matrix could extend an
> > abstract matrix and over-ride get() to generate a value and set() to
> throw
> > an unsupported operation exception.
>
> Do you mean that the matrix is stateless (each call to "get" generates a
> new
> value)?
>

Yes.  And I would also typically expect it to be immutable although changing
p is a reasonable thing to do (I just prefer immutable objects where
practical).


>
> > If p < 0.1, then the matrix should be
> > marked as sparse, else as dense.
> >
> > All operations against other matrices, sparse or dense should work well
> > without any special handling by the implementor of this matrix.
> >
> > This works in Mahout for instance by having the default operations in
> > AbstractMatrix test for sparseness of left or right operands and do the
> > right thing.  Obviously, a type test will not tell you whether this
> matrix
> > is sparse or not.
>
> As far as I understand, the sparseness test is an indicator that there is
> an
> optimized way to iterate over the entries (faster than a loop over all the
> indices) or that not all entries are explicitly stored. Is that correct?
>

Yes.


> If so, this is tied to the layout of the matrix, i.e. the type (which
> can change depending on an input value, such as "p" above).
>

Yes.


> However, once it is created, its type/implementation is either dense or
> sparse.
>

It depends on whether setP() is supported.  If the setter is there, then
density can even change after construction.


>
> > [...]
>
> I'm wondering whether this leads again to the issue of CM being a framework
> for everyone to build on, or if its data structures are mostly intended for
> internal use, so that the algorithms it provides are most efficient.


Well, it *is* open source (so they say).

That kind of indicates that it should be relatively easy for people to
provide code.

Also, I find that even the idea of this dichotomy is a bit of a mirage.
 Providing a good extensible design often gives faster code down the way a
bit.

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

On Tue, Aug 16, 2011 at 02:09:03PM -0700, Ted Dunning wrote:
> Here is an example from the perspective of somebody adding a new kind of
> matrix.
> 
> Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p) that
> has elements that are -1, 0 or 1.  1 and -1 have equal probabilities of p/2.
>  The value of p should be in [0,1].
> 
> It would be very nice if the implementor of this matrix could extend an
> abstract matrix and over-ride get() to generate a value and set() to throw
> an unsupported operation exception. 

Do you mean that the matrix is stateless (each call to "get" generates a new
value)?

> If p < 0.1, then the matrix should be
> marked as sparse, else as dense.
> 
> All operations against other matrices, sparse or dense should work well
> without any special handling by the implementor of this matrix.
> 
> This works in Mahout for instance by having the default operations in
> AbstractMatrix test for sparseness of left or right operands and do the
> right thing.  Obviously, a type test will not tell you whether this matrix
> is sparse or not.

As far as I understand, the sparseness test is an indicator that there is an
optimized way to iterate over the entries (faster than a loop over all the
indices) or that not all entries are explicitly stored. Is that correct?

If so, this is tied to the layout of the matrix, i.e. the type (which
can change depending on an input value, such as "p" above).

However, once it is created, its type/implementation is either dense or
sparse.

> [...]

I'm wondering whether this leads again to the issue of CM being a framework
for everyone to build on, or if its data structures are mostly intended for
internal use, so that the algorithms it provides are most efficient.

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

Here is an example from the perspective of somebody adding a new kind of
matrix.

Take the two kinds of matrix as RandomTrinaryMatrix(rows, columns, p) that
has elements that are -1, 0 or 1.  1 and -1 have equal probabilities of p/2.
 The value of p should be in [0,1].

It would be very nice if the implementor of this matrix could extend an
abstract matrix and over-ride get() to generate a value and set() to throw
an unsupported operation exception.  If p < 0.1, then the matrix should be
marked as sparse, else as dense.

All operations against other matrices, sparse or dense should work well
without any special handling by the implementor of this matrix.

This works in Mahout for instance by having the default operations in
AbstractMatrix test for sparseness of left or right operands and do the
right thing.  Obviously, a type test will not tell you whether this matrix
is sparse or not.

This matrix and siblings is very important in compressed sensing and
stochastic projection algorithms.

On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > Hi.
> >
> >> I understood what he was suggesting.  I still disagree.  Dynamic
> dispatch
> >> and non-lattice typing structure is still required to make this all
> work.
> >>  Java doesn't really do that.  Pretending that what Java does is
> sufficient
> >> is hammer-looking-for-a-nail, not solving the problems at hand.
> > Maybe that *I* don't understand what you are hinting at. Sorry for being
> > dense. [Although that seems appropriate in this discussion :-).]
> >
> > Polymorphism provides dynamic dispatch, overloading does not; that's why
> my
> > proposition is that when you manipulate "unknown" types, those should
> come
> > as "this", not as the argument of the method.
> >
> > What's wrong with that?
> >
> > As for "hammer-looking-for-a-nail", I also don't see what you mean: What
> is
> > the problem? I guess that there are lots of applications who never need
> to
> > know about sparse vectors/matrices. In those cases, the added complexity
> is
> > not a "feature". The issue reported contends that the current design in
> CM
> > can cause problems for dense implementations. I'm not even sure that the
> > current design is usable for the type of applications that make heavy use
> of
> > sparseness. Those are problems, IMHO.
>
> I have been out of pocket the last couple of days and may not have
> time to dig into this until late tonight, but I agree with Gilles
> that we need to get the conversation here more concrete.  I know we
> discussed this before and Ted and others had good examples
> justifying the current setup.  Can we revisit these, please?   What
> would be great would be some examples both from the perspective of
> the [math] developer looking to add a new or specialized class and
> [math] users writing code that leverages the setup.
>
> Phil
> >
> >
> > Gilles
> >
> >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> gsterijevski@gmail.com>wrote:
> >>
> >>> Forgive me for pushing my nose under the tent... I couldn't resist.
> >>>
> >>> I think Gilles is saying that each specialization of the matrix/vector
> >>> objects would need to support pre (and post) multiplication with a
> dense.
> >>> So
> >>> the type issue would not be problematic.
> >>>
> >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
> >>> wrote:
> >>>
> >>>> No.
> >>>>
> >>>> You can't.  This is because the type is lost as you enter the generic
> >>>> library.
> >>>>
> >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> >>>> gilles@harfang.homelinux.org> wrote:
> >>>>
> >>>>>> They know that their own object is dense, but they don't know what
> >>> kind
> >>>>> of
> >>>>>> input they were given.  They should still run fast if the input is
> >>>>> sparse.
> >>>>>
> >>>>> Couldn't we still rely on polymorphism by implementing "preTimes":
> >>>>>   unknown.preTimes(dense)
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Phil Steitz <ph...@gmail.com>.

On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> Hi.
>
>> I understood what he was suggesting.  I still disagree.  Dynamic dispatch
>> and non-lattice typing structure is still required to make this all work.
>>  Java doesn't really do that.  Pretending that what Java does is sufficient
>> is hammer-looking-for-a-nail, not solving the problems at hand.
> Maybe that *I* don't understand what you are hinting at. Sorry for being
> dense. [Although that seems appropriate in this discussion :-).]
>
> Polymorphism provides dynamic dispatch, overloading does not; that's why my
> proposition is that when you manipulate "unknown" types, those should come
> as "this", not as the argument of the method.
>
> What's wrong with that?
>
> As for "hammer-looking-for-a-nail", I also don't see what you mean: What is
> the problem? I guess that there are lots of applications who never need to
> know about sparse vectors/matrices. In those cases, the added complexity is
> not a "feature". The issue reported contends that the current design in CM
> can cause problems for dense implementations. I'm not even sure that the
> current design is usable for the type of applications that make heavy use of
> sparseness. Those are problems, IMHO.

I have been out of pocket the last couple of days and may not have
time to dig into this until late tonight, but I agree with Gilles
that we need to get the conversation here more concrete.  I know we
discussed this before and Ted and others had good examples
justifying the current setup.  Can we revisit these, please?   What
would be great would be some examples both from the perspective of
the [math] developer looking to add a new or specialized class and
[math] users writing code that leverages the setup.

Phil
>
>
> Gilles
>
>> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <gs...@gmail.com>wrote:
>>
>>> Forgive me for pushing my nose under the tent... I couldn't resist.
>>>
>>> I think Gilles is saying that each specialization of the matrix/vector
>>> objects would need to support pre (and post) multiplication with a dense.
>>> So
>>> the type issue would not be problematic.
>>>
>>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>
>>>> No.
>>>>
>>>> You can't.  This is because the type is lost as you enter the generic
>>>> library.
>>>>
>>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
>>>> gilles@harfang.homelinux.org> wrote:
>>>>
>>>>>> They know that their own object is dense, but they don't know what
>>> kind
>>>>> of
>>>>>> input they were given.  They should still run fast if the input is
>>>>> sparse.
>>>>>
>>>>> Couldn't we still rely on polymorphism by implementing "preTimes":
>>>>>   unknown.preTimes(dense)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hi.

> I understood what he was suggesting.  I still disagree.  Dynamic dispatch
> and non-lattice typing structure is still required to make this all work.
>  Java doesn't really do that.  Pretending that what Java does is sufficient
> is hammer-looking-for-a-nail, not solving the problems at hand.

Maybe that *I* don't understand what you are hinting at. Sorry for being
dense. [Although that seems appropriate in this discussion :-).]

Polymorphism provides dynamic dispatch, overloading does not; that's why my
proposition is that when you manipulate "unknown" types, those should come
as "this", not as the argument of the method.

What's wrong with that?

As for "hammer-looking-for-a-nail", I also don't see what you mean: What is
the problem? I guess that there are lots of applications who never need to
know about sparse vectors/matrices. In those cases, the added complexity is
not a "feature". The issue reported contends that the current design in CM
can cause problems for dense implementations. I'm not even sure that the
current design is usable for the type of applications that make heavy use of
sparseness. Those are problems, IMHO.

Gilles

> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <gs...@gmail.com>wrote:
> 
> > Forgive me for pushing my nose under the tent... I couldn't resist.
> >
> > I think Gilles is saying that each specialization of the matrix/vector
> > objects would need to support pre (and post) multiplication with a dense.
> > So
> > the type issue would not be problematic.
> >
> > On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > No.
> > >
> > > You can't.  This is because the type is lost as you enter the generic
> > > library.
> > >
> > > On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > gilles@harfang.homelinux.org> wrote:
> > >
> > > > > They know that their own object is dense, but they don't know what
> > kind
> > > > of
> > > > > input they were given.  They should still run fast if the input is
> > > > sparse.
> > > >
> > > > Couldn't we still rely on polymorphism by implementing "preTimes":
> > > >   unknown.preTimes(dense)
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

I understood what he was suggesting.  I still disagree.  Dynamic dispatch
and non-lattice typing structure is still required to make this all work.
 Java doesn't really do that.  Pretending that what Java does is sufficient
is hammer-looking-for-a-nail, not solving the problems at hand.

On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <gs...@gmail.com>wrote:

> Forgive me for pushing my nose under the tent... I couldn't resist.
>
> I think Gilles is saying that each specialization of the matrix/vector
> objects would need to support pre (and post) multiplication with a dense.
> So
> the type issue would not be problematic.
>
> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > No.
> >
> > You can't.  This is because the type is lost as you enter the generic
> > library.
> >
> > On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > gilles@harfang.homelinux.org> wrote:
> >
> > > > They know that their own object is dense, but they don't know what
> kind
> > > of
> > > > input they were given.  They should still run fast if the input is
> > > sparse.
> > >
> > > Couldn't we still rely on polymorphism by implementing "preTimes":
> > >   unknown.preTimes(dense)
> >
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Greg Sterijevski <gs...@gmail.com>.

Forgive me for pushing my nose under the tent... I couldn't resist.

I think Gilles is saying that each specialization of the matrix/vector
objects would need to support pre (and post) multiplication with a dense. So
the type issue would not be problematic.

On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <te...@gmail.com> wrote:

> No.
>
> You can't.  This is because the type is lost as you enter the generic
> library.
>
> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> gilles@harfang.homelinux.org> wrote:
>
> > > They know that their own object is dense, but they don't know what kind
> > of
> > > input they were given.  They should still run fast if the input is
> > sparse.
> >
> > Couldn't we still rely on polymorphism by implementing "preTimes":
> >   unknown.preTimes(dense)
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

No.

You can't.  This is because the type is lost as you enter the generic
library.

On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> > They know that their own object is dense, but they don't know what kind
> of
> > input they were given.  They should still run fast if the input is
> sparse.
>
> Couldn't we still rely on polymorphism by implementing "preTimes":
>   unknown.preTimes(dense)

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

> Well, no they wouldn't.
> 
> They would often need to write something like
> 
>     dense.times(unknown)

An actual code excerpt would help me understand what you think is or isn't
or should be possible with CM.

If "dense" and "unknown" are vectors, then in "ArrayRealVector", "dotProduct"
uses a "sparseIterator" but "ebeMutliply" does not.

Also (and this is the issue reported in the link I was referring to in the
previous post), the "sparseIterator" is used for any vector that is not an
instance of "ArrayRealVector", which will be less efficient if the vector is
actually dense. This is the same kind of problem you referred to with user's
implementations, but in reverse (dense implementations are penalized).

> They know that their own object is dense, but they don't know what kind of
> input they were given.  They should still run fast if the input is sparse.

Couldn't we still rely on polymorphism by implementing "preTimes":
   unknown.preTimes(dense)
?


Gilles
 
> > > And it really helps for a dense matrix to optimize operations against a
> > > sparse operand as in dense.plus(sparse).
> >
> > Applications that manipulate sparse things would know the limitations of
> > polymorphism and would write
> >  sparse.plus(dense)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

Well, no they wouldn't.

They would often need to write something like

    dense.times(unknown)

They know that their own object is dense, but they don't know what kind of
input they were given.  They should still run fast if the input is sparse.

On Mon, Aug 15, 2011 at 3:06 PM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> > And it really helps for a dense matrix to optimize operations against a
> > sparse operand as in dense.plus(sparse).
>
> Applications that manipulate sparse things would know the limitations of
> polymorphism and would write
>  sparse.plus(dense)

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

> > > I'm in favor of moving some methods to the SparseXXX interface, but
> > > got voted down at the time. For application developers (like me),
> > > that can expect in advance if the Vector/Matrix is sparse or not it
> > > isn't a big deal. But I can see how it may cause problems for other
> > > libraries that want to leverage C-M.  And actually, having problems
> > > seeing why it is a big deal in general.  If I'm doing an operation
> > > like outer product, I would still prefer that the iterator skips the
> > > zero entries.
> >
> > I'm wondering whether, depending on the application field, one does not
> > know
> > in advance that one should use sparse implementations ("OpenMapRealVector"
> > and "OpenMapRealMatrix"). If those objects are used consistently throughout
> > the application, the operations should all leverage the sparseness
> > optimizations, thanks to polymorphism.
> >
> 
> Polymorphism is really too limited here to get good results.
> 
> And it really helps for a dense matrix to optimize operations against a
> sparse operand as in dense.plus(sparse).

Applications that manipulate sparse things would know the limitations of
polymorphism and would write
  sparse.plus(dense)

> If all matrices implement isSparse
> and sparseIterator, then the general implementation of plus here can have
> one test and get almost all of the benefits of specially coded methods.

One could consider that this comes at the expense of code clarity and a
slightly decreased efficiency for basic (dense) usage.

>  Those tests are needed since it isn't reasonable to assume that the dense
> implementation knows about all types of matrices if users are allowed to
> implement matrices.

Then, isn't there a problem in CM anyways because of those "instanceof"
operators, in "AbstractRealVector" and others? A superclass testing the type
of its subclasses does not feel right...

> Moreover, a simple single inheritance type structure is
> not rich enough to express the different kinds and conditions of a matrix.

So why not multiple hierarchies instead of "one size fits all"?

> I would say that the argument should go the other way.  Do you have a unit
> test demonstrating that having these methods in the general case *hurts*
> performance?  It obviously helps.
> 

https://issues.apache.org/jira/browse/MATH-628

> 
> > Could there be specific unit tests demonstrating the necessity of having
> > something like "Iterator<Entry> sparseIterator()" in "RealVector"? The
> > drawback is obviously that in dense implementations, this method must be
> > implemented but is used only in those cases where the object should have
> > been "sparse" in the first place.
> 
> 
> This "drawback" is trivial.  In the abstract implementation, you can just
> delegate to the dense iterator.  Thus, RealVector never needs to know.

In "AbstractRealVector", there is an implementation of "SparseEntryIterator"
but the Javadoc states:
 "[...] Concrete subclasses which are SparseVector implementations should
  make their own sparse iterator, not use this one. [...]"

That does not feel right either.

> 
> > Unless I'm mistaken, it looks as if it
> > would sufficient to have "converters" between sparse and dense
> > implementations.
> >
> 
> I think you are mistaken.  Converters are a royal pain.
> 
> Taking a case near to my heart, it is really, really important for gradient
> descent implementations of logistic regression to do smart things with
> sparse feature vectors.  On the other hand, they also need to handle dense
> feature vectors.  The internal coefficient matrix is, however, dense and
> there are half a dozen or more built-in kinds of sparse matrices.  There are
> even several implementations of dense matrices.
> 
> Almost all of the operations are actually on the internal coefficient
> matrix.  If the basic operations on a dense matrix all handle sparse cases
> correctly, then the learning code can declare the feature vector to be a
> Vector.  If not, I have to duplicate (at least) all of the learning simply
> for the benefit of the compiler.

Is this something you do with CM?
If not, could it be done with the CM's curent implementations of matrices
and vectors?


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

On Mon, Aug 15, 2011 at 4:04 AM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> Hello.
>
> > I'm in favor of moving some methods to the SparseXXX interface, but
> > got voted down at the time. For application developers (like me),
> > that can expect in advance if the Vector/Matrix is sparse or not it
> > isn't a big deal. But I can see how it may cause problems for other
> > libraries that want to leverage C-M.  And actually, having problems
> > seeing why it is a big deal in general.  If I'm doing an operation
> > like outer product, I would still prefer that the iterator skips the
> > zero entries.
>
> I'm wondering whether, depending on the application field, one does not
> know
> in advance that one should use sparse implementations ("OpenMapRealVector"
> and "OpenMapRealMatrix"). If those objects are used consistently throughout
> the application, the operations should all leverage the sparseness
> optimizations, thanks to polymorphism.
>

Polymorphism is really too limited here to get good results.

And it really helps for a dense matrix to optimize operations against a
sparse operand as in dense.plus(sparse).  If all matrices implement isSparse
and sparseIterator, then the general implementation of plus here can have
one test and get almost all of the benefits of specially coded methods.
 Those tests are needed since it isn't reasonable to assume that the dense
implementation knows about all types of matrices if users are allowed to
implement matrices.  Moreover, a simple single inheritance type structure is
not rich enough to express the different kinds and conditions of a matrix.

I would say that the argument should go the other way.  Do you have a unit
test demonstrating that having these methods in the general case *hurts*
performance?  It obviously helps.

> Could there be specific unit tests demonstrating the necessity of having
> something like "Iterator<Entry> sparseIterator()" in "RealVector"? The
> drawback is obviously that in dense implementations, this method must be
> implemented but is used only in those cases where the object should have
> been "sparse" in the first place.

This "drawback" is trivial.  In the abstract implementation, you can just
delegate to the dense iterator.  Thus, RealVector never needs to know.

> Unless I'm mistaken, it looks as if it
> would sufficient to have "converters" between sparse and dense
> implementations.
>

I think you are mistaken.  Converters are a royal pain.

Taking a case near to my heart, it is really, really important for gradient
descent implementations of logistic regression to do smart things with
sparse feature vectors.  On the other hand, they also need to handle dense
feature vectors.  The internal coefficient matrix is, however, dense and
there are half a dozen or more built-in kinds of sparse matrices.  There are
even several implementations of dense matrices.

Almost all of the operations are actually on the internal coefficient
matrix.  If the basic operations on a dense matrix all handle sparse cases
correctly, then the learning code can declare the feature vector to be a
Vector.  If not, I have to duplicate (at least) all of the learning simply
for the benefit of the compiler.

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hello.

> I'm in favor of moving some methods to the SparseXXX interface, but
> got voted down at the time. For application developers (like me),
> that can expect in advance if the Vector/Matrix is sparse or not it
> isn't a big deal. But I can see how it may cause problems for other
> libraries that want to leverage C-M.  And actually, having problems
> seeing why it is a big deal in general.  If I'm doing an operation
> like outer product, I would still prefer that the iterator skips the
> zero entries.

I'm wondering whether, depending on the application field, one does not know
in advance that one should use sparse implementations ("OpenMapRealVector"
and "OpenMapRealMatrix"). If those objects are used consistently throughout
the application, the operations should all leverage the sparseness
optimizations, thanks to polymorphism.

Could there be specific unit tests demonstrating the necessity of having
something like "Iterator<Entry> sparseIterator()" in "RealVector"? The
drawback is obviously that in dense implementations, this method must be
implemented but is used only in those cases where the object should have
been "sparse" in the first place. Unless I'm mistaken, it looks as if it
would sufficient to have "converters" between sparse and dense
implementations.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Bill Barker <bi...@verizon.net>.

I'm in favor of moving some methods to the SparseXXX interface, but got 
voted down at the time. For application developers (like me), that can 
expect in advance if the Vector/Matrix is sparse or not it isn't a big deal. 
But I can see how it may cause problems for other libraries that want to 
leverage C-M.  And actually, having problems seeing why it is a big deal in 
general.  If I'm doing an operation like outer product, I would still prefer 
that the iterator skips the zero entries.

-----Original Message----- 
From: Gilles Sadowski
Sent: Sunday, August 14, 2011 3:52 PM
To: dev@commons.apache.org
Subject: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Hi.

I'm rather confused by the appearance of sparseness handling at the level
of a "general" vector (i.e. "RealVector", "AbstractRealVector") as well as
at the level of a non-sparse data structure ("ArrayRealVector").

This makes for a lot of convoluted code containing "instanceof" operators...

It was also pointed out by Arne Plöse in
  https://issues.apache.org/jira/browse/MATH-626
that it could lead to inefficient code.

Following the suggestion there, I wonder whether we should perform some
cleanup of the "RealVector" hierarchy, such as moving methods that are
sparseness-related over to the "SparseRealVector" interface, and removing
anything sparseness-related from "AbstractRealVector" and "ArrayRealVector".

However I don't have any idea of the implications of this refactoring. The
documentation is not very explicit about why the sparseness was introduced
in "RealVector" and the Javadoc for "sparseIterator()" adds to the
confusion:
---CUT---
    /**
     * Specialized implementations may choose to not iterate over all
     * dimensions, either because those values are unset, or are equal
     * to defaultValue(), or are small enough to be ignored for the
     * purposes of iteration.
     * No guarantees are made about order of iteration.
     * In dense implementations, this method will often delegate to
     * {@link #iterator()}.
     *
     * @return a sparse iterator
     */
    Iterator<Entry> sparseIterator();
---CUT---

All dimensions (plural)? "defaultValue()"? Order of iteration?

Also, I would expect that an iterator in "RealVector" would by default
iterate over all indices and return the sequence of _values_ (double), not
of "Entry" objects.
I assume that those "Entry" objects are necessary only for implementations
of sparse vectors to avoid returning the many entries set to the default
(non-stored) value...
In fact, don't you think that the "RealVector" interface should extend
"Iterable<Double>" while "SparseRealVector" would extend "Iterable<Entry>"?


Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Posted by Ted Dunning <te...@gmail.com>.

In the Mahout matrix classes, it turned out very important to keep some
notion of sparsity in the abstract matrix and vector classes.  This allows
default implementations to make use of sparsity at a pretty non-specific
level which actually substantially decreases the amount of type reflection
that is needed.

If, for instance, somebody adds a banded matrix, they will need to implement
an isSparse method and a forAllNonZero iteration construct and all other
matrices will make use of that sparsity by default.

On Sun, Aug 14, 2011 at 3:52 PM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> Following the suggestion there, I wonder whether we should perform some
> cleanup of the "RealVector" hierarchy, such as moving methods that are
> sparseness-related over to the "SparseRealVector" interface, and removing
> anything sparseness-related from "AbstractRealVector" and
> "ArrayRealVector".
>
> However I don't have any idea of the implications of this refactoring. The
> documentation is not very explicit about why the sparseness was introduced
> in "RealVector" and the Javadoc for "sparseIterator()" adds to the
> confusion:
>