You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dawid Weiss <da...@gmail.com> on 2011/02/22 13:18:28 UTC

Mahout math calls in Carrot2

Hi Ted,

I'm sorry it took so long, but I remember our discussion about Mahout
Math's API uses in Carrot2. Adding to what Staszek already said about
the requirement of being able to do in-place computations on matrices,
I decided to get all the actual API calls we currently use. I wrote an
AspectJ aspect and processed the binaries our of curiosity. Seems like
we're using the following:

generic stuff:

org.apache.mahout.math.Arrays.trimToCapacity(double[], int))
org.apache.mahout.math.function.DoubleComparator.compare(double, double))
org.apache.mahout.math.function.DoubleFunction.apply(double))
org.apache.mahout.math.function.Functions.chain(org.apache.mahout.math.function.UnaryFunction,
org.apache.mahout.math.function.BinaryFunction))
org.apache.mahout.math.function.Functions.mult(double))
org.apache.mahout.math.function.Functions.plus(double))
org.apache.mahout.math.function.Functions.swapArgs(org.apache.mahout.math.function.BinaryFunction))
org.apache.mahout.math.function.Mult.div(double))
org.apache.mahout.math.GenericPermuting.permute(int[], int[]))
org.apache.mahout.math.list.DoubleArrayList.get(int))
org.apache.mahout.math.list.DoubleArrayList.<init>(int))
org.apache.mahout.math.list.DoubleArrayList.size())
org.apache.mahout.math.list.IntArrayList.get(int))
org.apache.mahout.math.list.IntArrayList.<init>(int))

This one most likely replaceable:

org.apache.mahout.math.matrix.doublealgo.Sorting.sort(org.apache.mahout.math.matrix.DoubleMatrix2D,
double[]))

And matrices:

org.apache.mahout.math.matrix.DoubleFactory2D.make(int, int))
org.apache.mahout.math.matrix.DoubleMatrix1D.aggregate(org.apache.mahout.math.function.BinaryFunction,
org.apache.mahout.math.function.UnaryFunction))
org.apache.mahout.math.matrix.DoubleMatrix1D.assign(double))
org.apache.mahout.math.matrix.DoubleMatrix1D.assign(org.apache.mahout.math.function.UnaryFunction))
org.apache.mahout.math.matrix.DoubleMatrix1D.assign(org.apache.mahout.math.matrix.DoubleMatrix1D,
org.apache.mahout.math.function.BinaryFunction))
org.apache.mahout.math.matrix.DoubleMatrix1D.toArray())
org.apache.mahout.math.matrix.DoubleMatrix2D.assign(double))
org.apache.mahout.math.matrix.DoubleMatrix2D.assign(org.apache.mahout.math.function.UnaryFunction))
org.apache.mahout.math.matrix.DoubleMatrix2D.assign(org.apache.mahout.math.matrix.DoubleMatrix2D))
org.apache.mahout.math.matrix.DoubleMatrix2D.assign(org.apache.mahout.math.matrix.DoubleMatrix2D,
org.apache.mahout.math.function.BinaryFunction))
org.apache.mahout.math.matrix.DoubleMatrix2D.cardinality())
org.apache.mahout.math.matrix.DoubleMatrix2D.columns())
org.apache.mahout.math.matrix.DoubleMatrix2D.copy())
org.apache.mahout.math.matrix.DoubleMatrix2D.getNonZeros(org.apache.mahout.math.list.IntArrayList,
org.apache.mahout.math.list.IntArrayList,
org.apache.mahout.math.list.DoubleArrayList))
org.apache.mahout.math.matrix.DoubleMatrix2D.getQuick(int, int))
org.apache.mahout.math.matrix.DoubleMatrix2D.rows())
org.apache.mahout.math.matrix.DoubleMatrix2D.set(int, int, double))
org.apache.mahout.math.matrix.DoubleMatrix2D.setQuick(int, int, double))
org.apache.mahout.math.matrix.DoubleMatrix2D.toStringShort())
org.apache.mahout.math.matrix.DoubleMatrix2D.viewColumn(int))
org.apache.mahout.math.matrix.DoubleMatrix2D.viewDice())
org.apache.mahout.math.matrix.DoubleMatrix2D.viewPart(int, int, int, int))
org.apache.mahout.math.matrix.DoubleMatrix2D.viewRow(int))
org.apache.mahout.math.matrix.DoubleMatrix2D.viewSelection(int[], int[]))
org.apache.mahout.math.matrix.DoubleMatrix2D.zMult(org.apache.mahout.math.matrix.DoubleMatrix2D,
org.apache.mahout.math.matrix.DoubleMatrix2D))
org.apache.mahout.math.matrix.DoubleMatrix2D.zMult(org.apache.mahout.math.matrix.DoubleMatrix2D,
org.apache.mahout.math.matrix.DoubleMatrix2D, double, double, boolean,
boolean))
org.apache.mahout.math.matrix.impl.DenseDoubleMatrix2D.assign(org.apache.mahout.math.matrix.DoubleMatrix2D))
org.apache.mahout.math.matrix.impl.SparseDoubleMatrix2D.<init>(int, int))

Linear algebra:

org.apache.mahout.math.matrix.linalg.Algebra.normF(org.apache.mahout.math.matrix.DoubleMatrix2D))
org.apache.mahout.math.matrix.linalg.EigenvalueDecomposition.getRealEigenvalues())
org.apache.mahout.math.matrix.linalg.EigenvalueDecomposition.<init>(org.apache.mahout.math.matrix.DoubleMatrix2D))
org.apache.mahout.math.matrix.linalg.SingularValueDecomposition.getSingularValues())
org.apache.mahout.math.matrix.linalg.SingularValueDecomposition.getU())
org.apache.mahout.math.matrix.linalg.SingularValueDecomposition.getV())
org.apache.mahout.math.matrix.linalg.SingularValueDecomposition.<init>(org.apache.mahout.math.matrix.DoubleMatrix2D))

And that would be it. Don't know how much this helps, but wanted to
leave a trace so that we can reflect on it later, if needed.

Dawid

Re: Mahout math calls in Carrot2

Posted by Dawid Weiss <da...@gmail.com>.
> So Dawid, if there were a LinearDenseMatrix that used a strided dense
> representation, what that meet your native code needs?  Given our
> AbstractMatrix data structure and reasonably abstract tests, it would be
> pretty easy to build this.


Yes, this would be helpful. I believe it makes a lot of sense to have a
strided representation even if you're using Java only -- constant
multiplicative offset indexing on arrays gets compiled into efficient native
instructions (don't know the corner cases though), whereas [][] will
generate unnecessary code to check for NPEs and such.


> I think that would make adding JNI or JNA code for some ops pretty easy via
> over-rides.  My experience, though, is that JNI matrices have to be pretty
> big to get any win from JNI and JNA.
>

Big matrices or lots of computations on smaller ones (as in our
decomposition routines). Staszek measured it once, but the speedup is
significant for larger data sets. Like I said -- I'm not really desperate
for that old native build of Atlas that we have...

D.

Re: Mahout math calls in Carrot2

Posted by Ted Dunning <te...@gmail.com>.
So Dawid, if there were a LinearDenseMatrix that used a strided dense
representation, what that meet your native code needs?  Given our
AbstractMatrix data structure and reasonably abstract tests, it would be
pretty easy to build this.

I think that would make adding JNI or JNA code for some ops pretty easy via
over-rides.  My experience, though, is that JNI matrices have to be pretty
big to get any win from JNI and JNA.

On Wed, Feb 23, 2011 at 1:32 AM, Dawid Weiss <da...@gmail.com> wrote:

> The only problem is that,
> like Staszek mentioned, we do rely on the internal representation of
> matrices (and access it directly via a subclass).
>

Re: Mahout math calls in Carrot2

Posted by Dawid Weiss <da...@gmail.com>.
>> org.apache.mahout.math.matrix.DoubleMatrix1D.toArray())
>>
>> org.apache.mahout.math.matrix.DoubleMatrix2D.toStringShort())

> The first one can be supported very easily.  At least, in looking through
> your list it seems like we already do, or could trivially support everything
> in your list.

I guess it wouldn't be too difficult, yes. The only problem is that,
like Staszek mentioned, we do rely on the internal representation of
matrices (and access it directly via a subclass). This said, I think
we will remove the current implementation of the native matrix
routines because they rely on an obsolete compilation of Atlas
(Blas/Lapack). A much better idea would be to implement native-code
backed matrices in Mahout (at least for the basic ops like mults).
This is a tedious, but also fun task (requires a good knowledge of
various technologies - JNI, native linkers, etc.), good for a GSoC,
for example...

> Can you say what the toStringShort method gives you?  Are you serializing or
> are you just trying to provide a readable form?

Nothing to worry about, just debugging stuff like this:

            throw new IllegalArgumentException("Matrix2D inner
dimensions must agree:"
                + toStringShort() + ", " + B.toStringShort());

> Also, I thought you mentioned earlier that you were depending on the
> internal storage layout of DoubleMatrix2D. Am I confused on that point?

We currently do, but I think we will drop the native computation
support until Mahout itself supports it... The code we currently have
for native matrices is so ugly that my eyes pop (it's nobody's fault,
we needed to work around Colt's, then Mahout Math's matrix classes
infrastructure to access the internals).

Dawid

Re: Mahout math calls in Carrot2

Posted by Ted Dunning <te...@gmail.com>.
OK.  Out of these the only interesting/difficult cases are these:

On Tue, Feb 22, 2011 at 4:18 AM, Dawid Weiss <da...@gmail.com> wrote:

> org.apache.mahout.math.matrix.DoubleMatrix1D.toArray())
> ...
>
org.apache.mahout.math.matrix.DoubleMatrix2D.toStringShort())
>

The first one can be supported very easily.  At least, in looking through
your list it seems like we already do, or could trivially support everything
in your list.

Can you say what the toStringShort method gives you?  Are you serializing or
are you just trying to provide a readable form?

Also, I thought you mentioned earlier that you were depending on the
internal storage layout of DoubleMatrix2D. Am I confused on that point?