You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Bhaskar Devireddy (JIRA)" <ji...@apache.org> on 2012/07/09 23:32:34 UTC

[jira] [Created] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Bhaskar Devireddy created MAHOUT-1042:
-----------------------------------------

             Summary: Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
                 Key: MAHOUT-1042
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.7, 0.6
            Reporter: Bhaskar Devireddy
            Assignee: Sean Owen
            Priority: Minor


While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Posted by Sebastian Schelter <ss...@apache.org>.
Nice spot! We have to use .assign and Functions.PLUS.

2012/7/9 Sean Owen (JIRA) <ji...@apache.org>:
>
>     [ https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409895#comment-13409895 ]
>
> Sean Owen commented on MAHOUT-1042:
> -----------------------------------
>
> I like it but can this be done with the existing assign() method and a DoubleFunction that adds? If not, I think a separate method like addTo() would be better than a flag.
>
>> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
>> -------------------------------------------------------
>>
>>                 Key: MAHOUT-1042
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>>             Project: Mahout
>>          Issue Type: Improvement
>>          Components: Collaborative Filtering
>>    Affects Versions: 0.6, 0.7
>>            Reporter: Bhaskar Devireddy
>>            Assignee: Sean Owen
>>            Priority: Minor
>>         Attachments: Mahout_1042.patch
>>
>>
>> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>

[jira] [Updated] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-1042:
---------------------------------------

    Assignee: Sebastian Schelter  (was: Sean Owen)
    
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
>                 Key: MAHOUT-1042
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6, 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sebastian Schelter
>            Priority: Minor
>         Attachments: Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-1042:
---------------------------------------

    Attachment: MAHOUT-1042.patch
    
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
>                 Key: MAHOUT-1042
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6, 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sebastian Schelter
>            Priority: Minor
>         Attachments: MAHOUT-1042.patch, Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409895#comment-13409895 ] 

Sean Owen commented on MAHOUT-1042:
-----------------------------------

I like it but can this be done with the existing assign() method and a DoubleFunction that adds? If not, I think a separate method like addTo() would be better than a flag.
                
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
>                 Key: MAHOUT-1042
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6, 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Posted by "Bhaskar Devireddy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bhaskar Devireddy updated MAHOUT-1042:
--------------------------------------

    Attachment: Mahout_1042.patch
    
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
>                 Key: MAHOUT-1042
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6, 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-1042) Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter resolved MAHOUT-1042.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8

Very nice find, thank you!

I changed the code here to not use .plus() and .times(), but only .assign() on the vectors.

Furthermore, I added a special handling in the assign() method for PLUS_ABS and found that two jobs in RecommenderJob need to be map-only, so I could remove the identity reducers there.

Overall this should give a huge boost to our recommenders performance!
                
> Hotspot in RecommenderJob-PartialMultiplyMapper-Reducer
> -------------------------------------------------------
>
>                 Key: MAHOUT-1042
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1042
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6, 0.7
>            Reporter: Bhaskar Devireddy
>            Assignee: Sebastian Schelter
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1042.patch, Mahout_1042.patch
>
>
> While profiling PartialMultiplyMapper-Reducer job we noticed a hotspot consuming more than 40% of the CPU time in org.apache.mahout.math.RandomAccessSparseVector.assign method for the reducer task.  We used the script provided in mahout examples for running ASF Email recommendations for profiling. The hotspot is coming from the use of Vector.plus(Vector x) method in AggregateAndRecommendReducerc class.  The pattern used is VectorA = VectorA.plus(VectorB).  In this case VectorA doesn't have to be cloned using assign method.  The attached patch addresses the hotspot by eliminating cloning in the above case for plus and times methods.  This patch while retaining functionality (verified the output with and without patch), speeds up execution time of PartialMultiplyMapper-Reducer job by more than 10X on x86 architectures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira