You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2009/06/24 06:52:07 UTC

[jira] Created: (MAHOUT-139) Make use of Vector Iterator capabilities where appropriate

Make use of Vector Iterator capabilities where appropriate
----------------------------------------------------------

                 Key: MAHOUT-139
                 URL: https://issues.apache.org/jira/browse/MAHOUT-139
             Project: Mahout
          Issue Type: Improvement
    Affects Versions: 0.2
            Reporter: Grant Ingersoll
            Assignee: Grant Ingersoll
             Fix For: 0.2


There are a bunch of places where we loop over the size of the vector when we should be taking advantage of the sparseness, or at least be agnostic about it and use an iterator.

This patch addresses these issues in the Vector implementations and in the DistanceMeasure implementations

Also adds iterateNonZero() and interateAll and drops the Iterable portion of Vector since it wasn't clear what it was iterating



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-139) Make use of Vector Iterator capabilities where appropriate

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-139:
-----------------------------------

    Attachment: MAHOUT-139.patch

Draft of a patch that makes a whole lot of conversions to use an appropriate Iterator.

Drops Vector extends Iterator and instead provides two methods:
iterateAll()
iterateNonZero()

Iterators are now implemented by DenseVect and SparseVect instead of AbstractVector to try and take advantage of class specific data structures.

Also updates the DistanceMeasures where appropriate.

All tests passed in core.  

The profiling view looks a lot healthier too, as the primary bottlenecks are now in code that actually does the work, versus the data structures and accessors.

> Make use of Vector Iterator capabilities where appropriate
> ----------------------------------------------------------
>
>                 Key: MAHOUT-139
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-139
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.2
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: MAHOUT-139.patch
>
>
> There are a bunch of places where we loop over the size of the vector when we should be taking advantage of the sparseness, or at least be agnostic about it and use an iterator.
> This patch addresses these issues in the Vector implementations and in the DistanceMeasure implementations
> Also adds iterateNonZero() and interateAll and drops the Iterable portion of Vector since it wasn't clear what it was iterating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-139) Make use of Vector Iterator capabilities where appropriate

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved MAHOUT-139.
------------------------------------

    Resolution: Fixed

> Make use of Vector Iterator capabilities where appropriate
> ----------------------------------------------------------
>
>                 Key: MAHOUT-139
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-139
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.2
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: MAHOUT-139.patch
>
>
> There are a bunch of places where we loop over the size of the vector when we should be taking advantage of the sparseness, or at least be agnostic about it and use an iterator.
> This patch addresses these issues in the Vector implementations and in the DistanceMeasure implementations
> Also adds iterateNonZero() and interateAll and drops the Iterable portion of Vector since it wasn't clear what it was iterating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-139) Make use of Vector Iterator capabilities where appropriate

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723753#action_12723753 ] 

Grant Ingersoll commented on MAHOUT-139:
----------------------------------------

Committed revision 788186.

> Make use of Vector Iterator capabilities where appropriate
> ----------------------------------------------------------
>
>                 Key: MAHOUT-139
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-139
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.2
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: MAHOUT-139.patch
>
>
> There are a bunch of places where we loop over the size of the vector when we should be taking advantage of the sparseness, or at least be agnostic about it and use an iterator.
> This patch addresses these issues in the Vector implementations and in the DistanceMeasure implementations
> Also adds iterateNonZero() and interateAll and drops the Iterable portion of Vector since it wasn't clear what it was iterating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-139) Make use of Vector Iterator capabilities where appropriate

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723667#action_12723667 ] 

Grant Ingersoll commented on MAHOUT-139:
----------------------------------------

I'd like to commit this soon.  My preliminary tests are pretty positive in terms of the performance gains to be had by being smarter about iteration but it would be helpful to have some feedback.

> Make use of Vector Iterator capabilities where appropriate
> ----------------------------------------------------------
>
>                 Key: MAHOUT-139
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-139
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.2
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: MAHOUT-139.patch
>
>
> There are a bunch of places where we loop over the size of the vector when we should be taking advantage of the sparseness, or at least be agnostic about it and use an iterator.
> This patch addresses these issues in the Vector implementations and in the DistanceMeasure implementations
> Also adds iterateNonZero() and interateAll and drops the Iterable portion of Vector since it wasn't clear what it was iterating

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.