You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Bryce Nyeggen (Created) (JIRA)" <ji...@apache.org> on 2012/01/27 21:36:10 UTC

[jira] [Created] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
-----------------------------------------------------------------------------

                 Key: MAHOUT-963
                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
             Project: Mahout
          Issue Type: Bug
          Components: Collaborative Filtering
    Affects Versions: 0.6
            Reporter: Bryce Nyeggen
            Assignee: Sean Owen
             Fix For: 0.6


Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203146#comment-13203146 ] 

Hudson commented on MAHOUT-963:
-------------------------------

Integrated in Mahout-Quality #1340 (See [https://builds.apache.org/job/Mahout-Quality/1340/])
    MAHOUT-963 additional change to "less than" for reversed sort, which is necessary in one test

srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1241616
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/model/GenericItemPreferenceArray.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/model/GenericUserPreferenceArray.java

                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-963:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.7
           Status: Resolved  (was: Patch Available)
    
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Sean Owen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195599#comment-13195599 ] 

Sean Owen commented on MAHOUT-963:
----------------------------------

I think something else must be at work... I just don't see how sorting 20 things can take more time than it does to, say, read them into memory from disk and construct an object around them. It can't be 90+% of the time. But in any event it is a good change.

Some similarity implementations depend on them being ordered by user or item, to do an efficient intersection.
                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Bryce Nyeggen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryce Nyeggen updated MAHOUT-963:
---------------------------------

    Attachment: MAHOUT-963.diff

Switches from selection sort to comb sort.
                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Bryce Nyeggen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryce Nyeggen updated MAHOUT-963:
---------------------------------

    Comment: was deleted

(was: Changes from selection sort to comb sort)
    
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Sean Owen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-963:
-----------------------------

         Priority: Minor  (was: Major)
    Fix Version/s:     (was: 0.6)
       Issue Type: Improvement  (was: Bug)

My profiling shows that the sorting barely registers as part of the time taken to load a model -- do you have a case where it seems to make a measurable difference?
These arrays are so small that any sort isn't a problem that I can see.

Still, small wins are wins. I tried a little test, sorting a bunch of these arrays, where the size is exponentially distributed. The comb sort gets faster when the average size is about 15 or more, which is realistic for a lot of data sets. I can squeeze a little more out of the implementation to make it a little faster.

I think it's a positive change and will put it in after the code freeze.
                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Bryce Nyeggen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryce Nyeggen updated MAHOUT-963:
---------------------------------

    Status: Patch Available  (was: Open)

Changes from selection sort to comb sort
                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>             Fix For: 0.6
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-963) GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts

Posted by "Bryce Nyeggen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195576#comment-13195576 ] 

Bryce Nyeggen commented on MAHOUT-963:
--------------------------------------

I discovered this on a dataset of several million user-item pairs, with <20 items / user on average, but probably several hundred thousand users associated with the most common items.  In those cases, it brings it from several hours to construct the data model to a couple minutes, with nearly all of the gain associated with faster sorts on the GenericItemPreferenceArrays.  

As a side note, in the time I spent looking at the code, I didn't see any calling code that depends on the GenericItemPreferenceArrays being sorted by user - is it currently necessary for them to be sorted at all, or is it just to allow future optimizations?
                
> GenericUserPreferenceArray and GenericItemPreferenceArray use selection sorts
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-963
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-963
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Bryce Nyeggen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-963.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Both PreferenceArray implementations use selection sorts with poor performance.  These sorts are invoked during construction of GenericDataModels, causing excessive construction time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira