You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Alexander Hans <al...@ahans.de> on 2010/10/22 08:33:50 UTC

Matrix.getNumNondefaultElements returning int[]

Hey,

Why does Matrix.getNumNondefaultElements() return an int[2] containing the
numbers of rows and numbers of columns with non-default elements? I would
expect it to return just an int containing the actual number of
non-default elements. From the number of rows and columns I think one can
do nothing except derive an upper limit of the total number as
numRows*numCols. If a user of Matrix wants to know the actual total number
of non-default elements, I think currently he/she would have to iterate
the whole matrix and count him-/herself. It seems like the currently
implemented version is used nowhere, the only reference I've found is
testSize(), which tests that method.

So can I go ahead and change the interface and the implementations to
return the actual number of non-default elements?


Thanks,

Alex



Re: Matrix.getNumNondefaultElements returning int[]

Posted by Ted Dunning <te...@gmail.com>.
These danged users are just sooo unpredictable.

Your argument does carry some weight, however, in modified form.

Code that is not used in Mahout even to the extent of not having a test
cases should be deprecated
and shouldn't be used by anybody else.  But that is a different story.

On Fri, Oct 22, 2010 at 1:31 PM, Alexander Hans <al...@ahans.de> wrote:

> > It is what it was and users may already use it.
>
> For me that's the strongest argument. I haven't thought of this at all!
> I'm just not used to code that people not comitting to the same repository
> like me might be using. So I assumed: if it's not used within Mahout, it's
> not used at all. ;-)

Re: Matrix.getNumNondefaultElements returning int[]

Posted by Ted Dunning <te...@gmail.com>.
Knowing the exact number often requires extensive iteration because, for
instance, hashed representations might have a significant number of elements
marked MISSING due to assignments.

Getting an upper bound on the number isn't usually too hard, but it is still
O(n) rather than O(1).

On Fri, Oct 22, 2010 at 1:31 PM, Alexander Hans <al...@ahans.de> wrote:

> > It can be hard for an implementation to know the exact number of
> > non-default elements.
>
> I can't quite see that. Can you elaborate please?

Re: Matrix.getNumNondefaultElements returning int[]

Posted by Alexander Hans <al...@ahans.de>.
> Uhh... isn't this what iterateNonZero does?

If there was an iterateNonZero() in the Matrix interface and its
implementation, I guess that's what it would do. Seems like so far
nobody's really used matrices ... ;-)

Anyway, have a look at the patch I submitted before. I implemented a
iterateActuallySet() method for each matrix implementation where it makes
sense. I did this because I assumed that calling matrix.set(i, j, 0) sets
that value in memory also for a sparse matrix. But I just had another look
at the code and it turns out that a matrix.set(i, j, 0) would actually
remove that entry if it was something else than 0 before. So it's probably
best to stick with getNumNonZero() and iterateNonZero(). I will fix that
tomorrow and add another patch. I guess it won't take more than renaming
those two methods. Need to sleep now though ...


Cheers,

Alex


>
> On Fri, Oct 22, 2010 at 1:31 PM, Alexander Hans <al...@ahans.de> wrote:
>
>> > That isn't soooo bad, especially if somebody is about to iterate over
>> all
>> > those elements.
>>
>> There's currently no way to iterate over the non-default elements, this
>> I
>> will have to implement as well.
>



Re: Matrix.getNumNondefaultElements returning int[]

Posted by Ted Dunning <te...@gmail.com>.
Uhh... isn't this what iterateNonZero does?

On Fri, Oct 22, 2010 at 1:31 PM, Alexander Hans <al...@ahans.de> wrote:

> > That isn't soooo bad, especially if somebody is about to iterate over all
> > those elements.
>
> There's currently no way to iterate over the non-default elements, this I
> will have to implement as well.

Re: Matrix.getNumNondefaultElements returning int[]

Posted by Alexander Hans <al...@ahans.de>.
> FOR:
>
> It can be hard for an implementation to know the exact number of
> non-default elements.

I can't quite see that. Can you elaborate please?


> There is an analogy with the size() method.
>
> It is what it was and users may already use it.

For me that's the strongest argument. I haven't thought of this at all!
I'm just not used to code that people not comitting to the same repository
like me might be using. So I assumed: if it's not used within Mahout, it's
not used at all. ;-)

I could add a new method, maybe getNumNonZeroElements(), after all that's
what non-default means. Unless the matrix is dense ... Hm, maybe that name
is not that great ...


> AGAINST:
>
> The size() method was silly to begin with.  rowSize() and columnSize() are
> far more useful.  Returning an array
> implies a copy which means that there is lots of allocation going on or it
> means returning a reference to internal
> state which is worse.

True. So let's deprecate size() then?


> The worst cost is probably the row sparse representations.  NumNonDefault
> would have to iterate over all rows calling numNonDefault on each row.

This is how the current getNumNondefaultElements() works. It gets the
number of rows as rows.size() (where rows is a hashmap) and then iterates
over the row vectors calling vectorEntry.getNumNondefaultElements() each
time to find the maximum. To get the actual number of non-default elements
I don't look for the maximum but calculate the sum--pretty much the same
cost I guess.


> That isn't soooo bad, especially if somebody is about to iterate over all
> those elements.

There's currently no way to iterate over the non-default elements, this I
will have to implement as well.


I need this for the MatrixWritable implementation. When writing/reading a
sparse matrix, I have to put the number of elements to follow into the
stream first.

This morning I found a DenseMatrixWritable class that is able to read and
write dense matrices. However, it's not related to MatrixWritable, but
only extends DenseMatrix and implements Writable itself. I think it's best
to delete this class. MatrixWritable should be able to read and write any
Matrix implementation with no need for a special class for each type. But
then again, maybe people are using it already ...


> SUMMARY:
>
> I wouldn't mind a change, but others should speak up if they care about
> the current behavior.

I will implement a new method, finish MatrixWritable, and submit a patch.
If that isn't alright, it can still be changed later.


Cheers,

Alex



Re: Matrix.getNumNondefaultElements returning int[]

Posted by Ted Dunning <te...@gmail.com>.
I think that there are two rationales FOR the current behavior and at least
one argument AGAINST:

FOR:

It can be hard for an implementation to know the exact number of non-default
elements.

There is an analogy with the size() method.

It is what it was and users may already use it.

AGAINST:

The size() method was silly to begin with.  rowSize() and columnSize() are
far more useful.  Returning an array
implies a copy which means that there is lots of allocation going on or it
means returning a reference to internal
state which is worse.

The worst cost is probably the row sparse representations.  NumNonDefault
would have to iterate
over all rows calling numNonDefault on each row.  That isn't soooo bad,
especially if somebody is about
to iterate over all those elements.


SUMMARY:

I wouldn't mind a change, but others should speak up if they care about the
current behavior.

On Thu, Oct 21, 2010 at 11:33 PM, Alexander Hans <al...@ahans.de> wrote:

> Hey,
>
> Why does Matrix.getNumNondefaultElements() return an int[2] containing the
> numbers of rows and numbers of columns with non-default elements? I would
> expect it to return just an int containing the actual number of
> non-default elements. From the number of rows and columns I think one can
> do nothing except derive an upper limit of the total number as
> numRows*numCols. If a user of Matrix wants to know the actual total number
> of non-default elements, I think currently he/she would have to iterate
> the whole matrix and count him-/herself. It seems like the currently
> implemented version is used nowhere, the only reference I've found is
> testSize(), which tests that method.
>
> So can I go ahead and change the interface and the implementations to
> return the actual number of non-default elements?
>
>
> Thanks,
>
> Alex
>
>
>