You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2009/06/19 19:08:28 UTC

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Writable support should make clone() pretty easy.  This verges on territory
where people have strong opinions, though.  What is the thought on
supporting clone()?

On Fri, Jun 19, 2009 at 9:49 AM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

> Should Vector extend Cloneable too? I'm going to implement Matrix Writable
> today.
> Jeff
>

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Sean Owen <sr...@gmail.com>.
Got it, and I think I agree. I imagine it is easy enough to write
clone() anyway that we can just go for the faster way. Yes that is me
signing up to write the methods.

On Sun, Jun 21, 2009 at 9:40 PM, Ted Dunning<te...@gmail.com> wrote:
> No.
>
> Just the opposite.
>
> I meant that going round the loop with writables is probably no worse than
> twice as slow.  That may be an ok starting point given that cloning costs
> will probably be dominated in a hadoop setting by I/O.
>
> On Sun, Jun 21, 2009 at 6:22 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> You mean 'twice as fast'? I
>

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Ted Dunning <te...@gmail.com>.
No.

Just the opposite.

I meant that going round the loop with writables is probably no worse than
twice as slow.  That may be an ok starting point given that cloning costs
will probably be dominated in a hadoop setting by I/O.

On Sun, Jun 21, 2009 at 6:22 PM, Sean Owen <sr...@gmail.com> wrote:

> You mean 'twice as fast'? I

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Sean Owen <sr...@gmail.com>.
You mean 'twice as fast'? I must say I find it hard to believe this
would ever be faster. In both cases one has to read the source object,
allocate storage for a new object, and copy. The serialization method
does strictly more work, by allocating a byte array,
serializing/deserializing, and copying in between. I agree, copies
should be avoided if possible anyway. but it's my strong guess that we
don't want to implement any clone() methods this way -- it's pretty
simple to write clone() anyhow. As always, open to being proven wrong
by data...

On Sun, Jun 21, 2009 at 1:18 PM, Ted Dunning<te...@gmail.com> wrote:
> For small vectors, I would believe that.  For large vectors, I would not be
> surprised to see the round-trip to be up to half as fast.
>
> The fact is, however, that you want to avoid copying for large vectors and
> matrices.

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Ted Dunning <te...@gmail.com>.
For small vectors, I would believe that.  For large vectors, I would not be
surprised to see the round-trip to be up to half as fast.

The fact is, however, that you want to avoid copying for large vectors and
matrices.

On Sun, Jun 21, 2009 at 6:54 AM, Sean Owen <sr...@gmail.com> wrote:

> clone() ought to be implemented in some
> way that doesn't involve serializing and deserializing in memory, just
> because that is typically going to be much slower. For some classes, maybe
> doesn't matter, but probably is quite important for key abstractions like
> Vector and Matrix.
>

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Sean Owen <sr...@gmail.com>.
Yeah I guess that is what I mean - clone() ought to be implemented in some
way that doesn't involve serializing and deserializing in memory, just
because that is typically going to be much slower. For some classes, maybe
doesn't matter, but probably is quite important for key abstractions like
Vector and Matrix.

On Jun 20, 2009 8:57 PM, "Ted Dunning" <te...@gmail.com> wrote:

clone is definitely independent of the Hadoop stuff, but if you have a
Writable, then one of the easiest ways to implement clone is just write to a
byte array and then pull it back.

That makes round trip tests a bit easier as well.

On Sat, Jun 20, 2009 at 10:18 AM, Sean Owen <sr...@gmail.com> wrote: > I
may be misunderstanding ...

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Ted Dunning <te...@gmail.com>.
clone is definitely independent of the Hadoop stuff, but if you have a
Writable, then one of the easiest ways to implement clone is just write to a
byte array and then pull it back.

That makes round trip tests a bit easier as well.

On Sat, Jun 20, 2009 at 10:18 AM, Sean Owen <sr...@gmail.com> wrote:

> I may be misunderstanding the comment here, but the suggestion is not to
> implement clone() in terms of the Writeable API or vice versa right? These
> are fairly distinct things.
>

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Sean Owen <sr...@gmail.com>.
I may be misunderstanding the comment here, but the suggestion is not to
implement clone() in terms of the Writeable API or vice versa right? These
are fairly distinct things.

On Jun 19, 2009 1:09 PM, "Ted Dunning" <te...@gmail.com> wrote:

Writable support should make clone() pretty easy.  This verges on territory
where people have strong opinions, though.  What is the thought on
supporting clone()?

On Fri, Jun 19, 2009 at 9:49 AM, Jeff Eastman <jdog@windwardsolutions.com
>wrote:

> Should Vector extend Cloneable too? I'm going to implement Matrix Writable
> today.
> Jeff
>

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Sean Owen <sr...@gmail.com>.
Oops yes Vector should therefore extend Cloneable, if I did not do that.

On Jun 19, 2009 1:42 PM, "Jeff Eastman" <jd...@windwardsolutions.com> wrote:

Sean's recent commit already implemented clone() to replace copy() and made
Matrix cloneable. It did not make Vector cloneable even though it supports
clone() too. I think it was an oversight.

Ted Dunning wrote: > > Writable support should make clone() pretty easy.
 This verges on territo...

Re: svn commit: r786370 [1/2] - in /lucene/mahout/trunk: core/src/main/java/org/apache/mahout/clustering/canopy/ core/src/main/java/org/apache/mahout/clustering/dirichlet/ core/src/main/java/org/apache/mahout/clustering/dirichlet/models/ core/src/mai

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Sean's recent commit already implemented clone() to replace copy() and 
made Matrix cloneable. It did not make Vector cloneable even though it 
supports clone() too. I think it was an oversight.



Ted Dunning wrote:
> Writable support should make clone() pretty easy.  This verges on territory
> where people have strong opinions, though.  What is the thought on
> supporting clone()?
>
> On Fri, Jun 19, 2009 at 9:49 AM, Jeff Eastman <jd...@windwardsolutions.com>wrote:
>
>   
>> Should Vector extend Cloneable too? I'm going to implement Matrix Writable
>> today.
>> Jeff
>>
>>     
>
>