You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Allen Day (JIRA)" <ji...@apache.org> on 2008/09/19 20:34:44 UTC

[jira] Created: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

DistanceMeasure calculation slow for SparseVector
-------------------------------------------------

                 Key: MAHOUT-77
                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
             Project: Mahout
          Issue Type: Improvement
          Components: Matrix
            Reporter: Allen Day
            Priority: Minor
         Attachments: sparse.patch

ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Pallavi Palleti (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632914#action_12632914 ] 

Pallavi Palleti commented on MAHOUT-77:
---------------------------------------

Hi Allen, It was suggested to use vector operations in addPoint and computeCentroid so that it makes simple to understand. Also, in distance measure classes too, we can replace the code using Vector operations like plus and minus,dot methods. Detail discussion is present in 
https://issues.apache.org/jira/browse/MAHOUT-66

Also, I have added plus and divide method specific for sparse vector. The patch which contain this is:https://issues.apache.org/jira/browse/MAHOUT-67

Thanks
Pallavi

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch, sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722188#action_12722188 ] 

Grant Ingersoll commented on MAHOUT-77:
---------------------------------------

Allen,

Any chance of bringing this up to date?

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch, sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-77:
----------------------------------

    Fix Version/s: 0.2

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Allen Day (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Day updated MAHOUT-77:
----------------------------

    Attachment: sparse.patch

added DistanceMeasure tests.  moved patch level generation up to capture tests and source changes in same patch.

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch, sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-77:
-------------------------------------

    Assignee: Grant Ingersoll

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch, sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Allen Day (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Day updated MAHOUT-77:
----------------------------

    Attachment: sparse.patch

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Priority: Minor
>         Attachments: sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632792#action_12632792 ] 

Grant Ingersoll commented on MAHOUT-77:
---------------------------------------

Sounds reasonable, Allen, can you add tests?

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-77) DistanceMeasure calculation slow for SparseVector

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved MAHOUT-77.
-----------------------------------

    Resolution: Won't Fix

See MAHOUT-139

> DistanceMeasure calculation slow for SparseVector
> -------------------------------------------------
>
>                 Key: MAHOUT-77
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-77
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Matrix
>            Reporter: Allen Day
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: sparse.patch, sparse.patch
>
>
> ManhattanDistanceMeasure and TanimotoDistanceMeasure assume all vector indices up to cardinality() must be compared.  We can speed this up for SparseVectors (and others) because Vector implements Iterable, so we can consider only non-zero indices.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.