You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Max Heimel <mh...@googlemail.com> on 2010/07/05 11:35:23 UTC

Fast way of copying Vectors/Matrices

Hi,

I am currently working on an interation algorithm that has to make a
copy of several matrices(vectors) during each iteration. In order to
make the inner loop as fast as possible I want to overwrite an
existing matrix instead of allocating a new one, i.e. I want the
content of matrix src to end up in matrix dest without having to
allocate anything :)

The only thing I found to achieve this was by iterating over the
source matrix(vector) and using setQuick for each Element on the
destination matrix. I did some simple measurements to check how
performant this approach is by comparing it to the same overwriting
operation on a native double array. You can see the measurements
below:
Cloning arrays took 10858636985 ns.
Copying arrays took 28933083 ns.
Cloning vectors took 28097446771 ns.
Copying vectors took 9944879475 ns.
As you can see, copying the content of the vector brings through
iterators is roughly 100 times slower than copying the content of a
double vector into another double vector.

So here's my question: is there a better way of overwriting one matrix
with another one other than iterating over the source and using
setQuick on the destination? Would it make sense to add an
assign(Matrix) (resepectively assign(Vector)) function, that (for
Dense implementations) internally copies the contents from the double
array of source into the double array of destination? Is there already
such a function that I just did not see? :)

Cheers
Max

Re: Fast way of copying Vectors/Matrices

Posted by Ted Dunning <te...@gmail.com>.
Yeah... but the cons up new storage which is what Max doesn't want.

On Mon, Jul 5, 2010 at 10:20 AM, Robin Anil <ro...@gmail.com> wrote:

> On Mon, Jul 5, 2010 at 3:56 PM, Jake Mannix <ja...@gmail.com> wrote:
>
> > Hey Max,
> >
> >  I'm trying to understand what your requirements are for the copy:
> > you essentially want to have a method on DenseVector which
> > is implemented as
> >
> >  DenseVector assign(DenseVector v) {
> >    System.arraycopy(v.values, 0, this.values, 0, size());
> >    return this;
> >  }
> >
> You can use Arrays.copyOf as well, in Java6. I have tested them and found
> they are equal in speed to the System.arraycopy
>
> http://www.developer.com/java/data/article.php/3680241/Copying-Arrays-in-Java-6.htm
>

Re: Fast way of copying Vectors/Matrices

Posted by Robin Anil <ro...@gmail.com>.
On Mon, Jul 5, 2010 at 3:56 PM, Jake Mannix <ja...@gmail.com> wrote:

> Hey Max,
>
>  I'm trying to understand what your requirements are for the copy:
> you essentially want to have a method on DenseVector which
> is implemented as
>
>  DenseVector assign(DenseVector v) {
>    System.arraycopy(v.values, 0, this.values, 0, size());
>    return this;
>  }
>
> You can use Arrays.copyOf as well, in Java6. I have tested them and found
they are equal in speed to the System.arraycopy
http://www.developer.com/java/data/article.php/3680241/Copying-Arrays-in-Java-6.htm

Re: Fast way of copying Vectors/Matrices

Posted by Max Heimel <mh...@googlemail.com>.
Hi,

I opened a JIRA ticket:
https://issues.apache.org/jira/browse/MAHOUT-435

Cheers
Max

On Mon, Jul 5, 2010 at 9:44 PM, Jake Mannix <ja...@gmail.com> wrote:
> On Mon, Jul 5, 2010 at 7:03 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Jake's suggestion is the one that you need.
>
>
> It should be noted that my suggestion is actually that we need to
> add a (very simple, non-api-changing, and backwards compatible)
> method to DenseMatrix and DenseVector to do this (i.e. this is a
> Mahout feature enhancement request).
>
> Max, if you want to open a JIRA ticket to track the request, this
> seems like a pretty sensible thing for us to provide.
>
>  -jake
>
> On Mon, Jul 5, 2010 at 5:57 AM, Max Heimel <mh...@googlemail.com> wrote:
>>
>> > DenseMatrix newMatrix = new DenseMatrix(initialMatrix);
>> > while(!converged)
>> > {
>> >     // do computation on newMatrix
>> >     converged = checkConvergence(initialMatrix, newMatrix);
>> >     // now copy newMatrix to initialMatrix
>> >     initialMatrix.assign(newMatrix);
>> > }
>>
>>
>> > I could do this using clone, but this would result in memory allocation
>> for
>> > each time the loop is invoked, simply overwriting the matrix would
>> probably
>> > be faster.
>> >
>> > I hope this clears things up :)
>> >
>> > On Mon, Jul 5, 2010 at 1:18 PM, Sean Owen <sr...@gmail.com> wrote:
>> >
>> > > True that -- so would creating a new DenseVector and filling it. Is
>> > > that vector going to be reused? OK I get it then.
>> > >
>> > > On Mon, Jul 5, 2010 at 12:11 PM, Jake Mannix <ja...@gmail.com>
>> > > wrote:
>> > > > clone() allocates new memory, while this doesn't, right?
>> > > >
>> > >
>> >
>>
>

Re: Fast way of copying Vectors/Matrices

Posted by Jake Mannix <ja...@gmail.com>.
On Mon, Jul 5, 2010 at 7:03 PM, Ted Dunning <te...@gmail.com> wrote:

> Jake's suggestion is the one that you need.


It should be noted that my suggestion is actually that we need to
add a (very simple, non-api-changing, and backwards compatible)
method to DenseMatrix and DenseVector to do this (i.e. this is a
Mahout feature enhancement request).

Max, if you want to open a JIRA ticket to track the request, this
seems like a pretty sensible thing for us to provide.

  -jake

On Mon, Jul 5, 2010 at 5:57 AM, Max Heimel <mh...@googlemail.com> wrote:
>
> > DenseMatrix newMatrix = new DenseMatrix(initialMatrix);
> > while(!converged)
> > {
> >     // do computation on newMatrix
> >     converged = checkConvergence(initialMatrix, newMatrix);
> >     // now copy newMatrix to initialMatrix
> >     initialMatrix.assign(newMatrix);
> > }
>
>
> > I could do this using clone, but this would result in memory allocation
> for
> > each time the loop is invoked, simply overwriting the matrix would
> probably
> > be faster.
> >
> > I hope this clears things up :)
> >
> > On Mon, Jul 5, 2010 at 1:18 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > True that -- so would creating a new DenseVector and filling it. Is
> > > that vector going to be reused? OK I get it then.
> > >
> > > On Mon, Jul 5, 2010 at 12:11 PM, Jake Mannix <ja...@gmail.com>
> > > wrote:
> > > > clone() allocates new memory, while this doesn't, right?
> > > >
> > >
> >
>

Re: Fast way of copying Vectors/Matrices

Posted by Ted Dunning <te...@gmail.com>.
Jake's suggestion is the one that you need.

On Mon, Jul 5, 2010 at 5:57 AM, Max Heimel <mh...@googlemail.com> wrote:

> Hi,
>
> exactly, I want to reuse an existing DenseMatrix to avoid having to
> allocate
> memory. The use case is that I want to check for convergence within a loop:
>
> DenseMatrix newMatrix = new DenseMatrix(initialMatrix);
> while(!converged)
> {
>     // do computation on newMatrix
>     converged = checkConvergence(initialMatrix, newMatrix);
>     // now copy newMatrix to initialMatrix
>     initialMatrix.assign(newMatrix);
> }
>
> I could do this using clone, but this would result in memory allocation for
> each time the loop is invoked, simply overwriting the matrix would probably
> be faster.
>
> I hope this clears things up :)
>
> On Mon, Jul 5, 2010 at 1:18 PM, Sean Owen <sr...@gmail.com> wrote:
>
> > True that -- so would creating a new DenseVector and filling it. Is
> > that vector going to be reused? OK I get it then.
> >
> > On Mon, Jul 5, 2010 at 12:11 PM, Jake Mannix <ja...@gmail.com>
> > wrote:
> > > clone() allocates new memory, while this doesn't, right?
> > >
> >
>

Re: Fast way of copying Vectors/Matrices

Posted by Max Heimel <mh...@googlemail.com>.
Hi,

exactly, I want to reuse an existing DenseMatrix to avoid having to allocate
memory. The use case is that I want to check for convergence within a loop:

DenseMatrix newMatrix = new DenseMatrix(initialMatrix);
while(!converged)
{
     // do computation on newMatrix
     converged = checkConvergence(initialMatrix, newMatrix);
     // now copy newMatrix to initialMatrix
     initialMatrix.assign(newMatrix);
}

I could do this using clone, but this would result in memory allocation for
each time the loop is invoked, simply overwriting the matrix would probably
be faster.

I hope this clears things up :)

On Mon, Jul 5, 2010 at 1:18 PM, Sean Owen <sr...@gmail.com> wrote:

> True that -- so would creating a new DenseVector and filling it. Is
> that vector going to be reused? OK I get it then.
>
> On Mon, Jul 5, 2010 at 12:11 PM, Jake Mannix <ja...@gmail.com>
> wrote:
> > clone() allocates new memory, while this doesn't, right?
> >
>

Re: Fast way of copying Vectors/Matrices

Posted by Sean Owen <sr...@gmail.com>.
True that -- so would creating a new DenseVector and filling it. Is
that vector going to be reused? OK I get it then.

On Mon, Jul 5, 2010 at 12:11 PM, Jake Mannix <ja...@gmail.com> wrote:
> clone() allocates new memory, while this doesn't, right?
>

Re: Fast way of copying Vectors/Matrices

Posted by Jake Mannix <ja...@gmail.com>.
clone() allocates new memory, while this doesn't, right?

  -jake

On Mon, Jul 5, 2010 at 12:58 PM, Sean Owen <sr...@gmail.com> wrote:

> If you're right, is this not just clone()?
> That should indeed be as fast or faster than even arraycopy.
>
> On Mon, Jul 5, 2010 at 11:26 AM, Jake Mannix <ja...@gmail.com>
> wrote:
> > Hey Max,
> >
> >  I'm trying to understand what your requirements are for the copy:
> > you essentially want to have a method on DenseVector which
> > is implemented as
> >
> >  DenseVector assign(DenseVector v) {
> >    System.arraycopy(v.values, 0, this.values, 0, size());
> >    return this;
> >  }
> >
>

Re: Fast way of copying Vectors/Matrices

Posted by Sean Owen <sr...@gmail.com>.
If you're right, is this not just clone()?
That should indeed be as fast or faster than even arraycopy.

On Mon, Jul 5, 2010 at 11:26 AM, Jake Mannix <ja...@gmail.com> wrote:
> Hey Max,
>
>  I'm trying to understand what your requirements are for the copy:
> you essentially want to have a method on DenseVector which
> is implemented as
>
>  DenseVector assign(DenseVector v) {
>    System.arraycopy(v.values, 0, this.values, 0, size());
>    return this;
>  }
>

Re: Fast way of copying Vectors/Matrices

Posted by Jake Mannix <ja...@gmail.com>.
Hey Max,

  I'm trying to understand what your requirements are for the copy:
you essentially want to have a method on DenseVector which
is implemented as

  DenseVector assign(DenseVector v) {
    System.arraycopy(v.values, 0, this.values, 0, size());
    return this;
  }

?

If so, I agree that this is probably a useful optimization for this
use case.  For the general assign(Vector) method, you don't know
the internal implementation, and you can't be as efficient, but
in this case, it could be.

The same could not really be easily done for the sparse vectors,
as they might not have the same number of nonzero elements.

  -jake



On Mon, Jul 5, 2010 at 11:35 AM, Max Heimel <mh...@googlemail.com> wrote:

> Hi,
>
> I am currently working on an interation algorithm that has to make a
> copy of several matrices(vectors) during each iteration. In order to
> make the inner loop as fast as possible I want to overwrite an
> existing matrix instead of allocating a new one, i.e. I want the
> content of matrix src to end up in matrix dest without having to
> allocate anything :)
>
> The only thing I found to achieve this was by iterating over the
> source matrix(vector) and using setQuick for each Element on the
> destination matrix. I did some simple measurements to check how
> performant this approach is by comparing it to the same overwriting
> operation on a native double array. You can see the measurements
> below:
> Cloning arrays took 10858636985 ns.
> Copying arrays took 28933083 ns.
> Cloning vectors took 28097446771 ns.
> Copying vectors took 9944879475 ns.
> As you can see, copying the content of the vector brings through
> iterators is roughly 100 times slower than copying the content of a
> double vector into another double vector.
>
> So here's my question: is there a better way of overwriting one matrix
> with another one other than iterating over the source and using
> setQuick on the destination? Would it make sense to add an
> assign(Matrix) (resepectively assign(Vector)) function, that (for
> Dense implementations) internally copies the contents from the double
> array of source into the double array of destination? Is there already
> such a function that I just did not see? :)
>
> Cheers
> Max
>