You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2009/12/10 20:32:18 UTC

[jira] Commented: (MAHOUT-208) Vector.getLengthSquared() is dangerously optimized

    [ https://issues.apache.org/jira/browse/MAHOUT-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788854#action_12788854 ] 

Ted Dunning commented on MAHOUT-208:
------------------------------------


This caching can be a really major win so I would prefer to keep it.

You are correct that many callers could cache it, but that can also be quite difficult because they would need to cache *lots* of lengths.  Moreover, the caching is primarily to assist in computing distances between sparse vectors.  Thus if you compute the distance v1 and v2 in one part of the code, it isn't real obvious how some other part of the code would know to cache these lengths for when v2 is compared to v3.  Certainly it isn't easy to understand how the caller could inject the cached values into the euclideanDistance call that it is supposed to accelerate.

Having the caching in vectors makes all of this happen with no overhead (other than stupid bugs) for users who don't use the capability and only minimal effort for users who do use the capability.

> Vector.getLengthSquared() is dangerously optimized
> --------------------------------------------------
>
>                 Key: MAHOUT-208
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-208
>             Project: Mahout
>          Issue Type: Bug
>          Components: Matrix
>    Affects Versions: 0.1
>         Environment: all
>            Reporter: Jake Mannix
>            Assignee: Sean Owen
>             Fix For: 0.3
>
>
> SparseVector and DenseVector both cache the value of lengthSquared, so that subsequent calls to it get the cached value.  Great, except the cache is never cleared - calls to set/setQuick or assign or anything, all leave the cached value unchanged.  
> Mutating method calls should set lengthNorm to -1 so that the cache is cleared.
> This could be a really nasty bug if hit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.