You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/09/30 22:16:10 UTC

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

On Sep 30, 2009, at 4:03 PM, Jake Mannix wrote:

> Regarding having equals() effectively delegate to
> getName().equals(other.getName()) && equivalent(other) means that we  
> need to
> be extra special careful about implementations of hashCode() :
>
> If we are not going to break the contract between equals() and  
> hashCode(),
> and we're having equals() *only* take into account the mathematical  
> contents
> and the name, then I'd say what we need to do is implement hashCode 
> () in a
> top level class like AbstractVector.

That is what is happening.

>
> (Is something funny going on with JIRA?  Seems broken...)

Yes, there is something wrong.  Infra is aware of it.


>
>  -jake
>
> On Wed, Sep 30, 2009 at 10:01 AM, Sean Owen (JIRA) <ji...@apache.org>  
> wrote:
>
>>
>>   [
>> https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760956#action_12760956]
>>
>> Sean Owen commented on MAHOUT-165:
>> ----------------------------------
>>
>> Are my conclusions sound then:
>>
>> We agree that equals() should be 'pretty strict'. The conventional  
>> Java
>> wisdom is that equals(), in fact, ought not return true for  
>> instances of
>> differing classes, unless you really know what you're doing. I  
>> guess we do.
>> :)
>>
>> If the idea behind equals() is "do class-specific stuff, otherwise,  
>> check
>> names, and use equivalent() then", then we don't need  
>> strictEquivalence() --
>> where's it used?
>>
>> (If I represented the logic correctly above -- is that as simple as  
>> we can
>> make it? seems a touch complex)
>>
>> I am not sure anything is 'broken' in practice here but I sense it  
>> could be
>> simpler.
>>
>>
>>> Using better primitives hash for sparse vector for performance gains
>>> --------------------------------------------------------------------
>>>
>>>                Key: MAHOUT-165
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-165
>>>            Project: Mahout
>>>         Issue Type: Improvement
>>>         Components: Matrix
>>>   Affects Versions: 0.2
>>>           Reporter: Shashikant Kore
>>>           Assignee: Grant Ingersoll
>>>            Fix For: 0.2
>>>
>>>        Attachments: colt.jar, mahout-165-trove.patch,  
>>> MAHOUT-165.patch,
>> mahout-165.patch
>>>
>>>
>>> In SparseVector, we need primitives hash map for index and values.  
>>> The
>> present implementation of this hash map is not as efficient as some  
>> of the
>> other implementations in non-Apache projects.
>>> In an experiment, I found that, for get/set operations, the  
>>> primitive
>> hash of  Colt performance an order of magnitude better than
>> OrderedIntDoubleMapping. For iteration it is 2x slower, though.
>>> Using Colt in Sparsevector improved performance of canopy  
>>> generation. For
>> an experimental dataset, the current implementation takes 50  
>> minutes. Using
>> Colt, reduces this duration to 19-20 minutes. That's 60% reduction  
>> in the
>> delay.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

Posted by Jake Mannix <ja...@gmail.com>.
On Wed, Sep 30, 2009 at 1:16 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Sep 30, 2009, at 4:03 PM, Jake Mannix wrote:
>
>  Regarding having equals() effectively delegate to
>> getName().equals(other.getName()) && equivalent(other) means that we need
>> to
>> be extra special careful about implementations of hashCode() :
>>
>> If we are not going to break the contract between equals() and hashCode(),
>> and we're having equals() *only* take into account the mathematical
>> contents
>> and the name, then I'd say what we need to do is implement hashCode() in a
>> top level class like AbstractVector.
>>
>
> That is what is happening.


It is on trunk, but not in Ted's patch, which is what I'm currently looking
at, and want
to make sure I'm adhering to convention as I play with Ted's impls.

  -jake