You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (Created) (JIRA)" <ji...@apache.org> on 2011/12/11 21:44:40 UTC

[jira] [Created] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

SSVD: ABt Job tweaks for extra sparse inputs
--------------------------------------------

Key: MAHOUT-922
URL: https://issues.apache.org/jira/browse/MAHOUT-922
Project: Mahout
Issue Type: Improvement
Components: Math
Affects Versions: 0.6
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
Fix For: 0.6

Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m).

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen).

So, several improvements in this patch:
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block.

Miscellanea:
Test run time: removed redundant tests and checks for SSVD. reduced test input size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Description: 
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 

Miscellanea: 
Test run time: removed redundant tests and checks for SSVD. reduced test input size.

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/ABt-tweaks 

  was:
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 

Miscellanea: 
Test run time: removed redundant tests and checks for SSVD. reduced test input size.

    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/ABt-tweaks 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Sebastian Schelter (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167205#comment-13167205 ] 

Sebastian Schelter commented on MAHOUT-922:
-------------------------------------------

A few details on the testcase: I'm trying to compute the first 10-20 eigenvalues of the symmetric adjacency matrix of the wikipedia page link graph from http://users.on.net/~henry/home/wikipedia.htm. 



                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/ABt-tweaks 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Description: 
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
-- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
-- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to (k+p), so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

  was:
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
-- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
-- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to it, so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> -- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
> -- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to (k+p), so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Description: 
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
-- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
-- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to it, so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

  was:
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
-- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
-- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> -- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
> -- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Description: 
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

  was:
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 

Miscellanea: 
Test run time: removed redundant tests and checks for SSVD. reduced test input size.

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/ABt-tweaks 

    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Attachment: MAHOUT-922.patch

combing up the style and comments, removing stale code, etc.
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167907#comment-13167907 ] 

Dmitriy Lyubimov commented on MAHOUT-922:
-----------------------------------------

Another idea that may further relief cpu bound behavior a little bit is to use single precision arithmetics. As far as I understand, rounding errors in Y_i before orthonormalization do not matter much. It is Q_i*A' operation that matters. In Y_i we are just looking for a somewhat better basis than our original completely random basis.
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174333#comment-13174333 ] 

Dmitriy Lyubimov edited comment on MAHOUT-922 at 12/21/11 7:20 PM:
-------------------------------------------------------------------

Resolved with all additions per description. Docs are updated to reflect CLI changes.
                
      was (Author: dlyubimov):
    Resolved with all addition per description. Docs are updated to reflect CLI changes.
                  
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> -- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
> -- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172589#comment-13172589 ] 

jiraposter@reviews.apache.org commented on MAHOUT-922:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3265/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

MAHOUT-922-2: add DistributedCache broadcast to B' files for AB' job and R-hat files for B' job, on by default, governed by -br option. 

Notes: Performance: I did not notice the difference between using distributed cache vs. opening direct streams, which is understandable since jobs are cpu-bound.
I did have to add some functionality to multifile sequence file iterators to allow for specifying multiple files coming from distributed cache which is neither glob nor directory. I also added fixes for some corner case NPEs there.

Sorry eclipse reformatting for style is a bit different from original Sean's formatting in Intellij, it is hard to adjust it exactly. 


This addresses bug MAHOUT-922.
    https://issues.apache.org/jira/browse/MAHOUT-922


Diffs
-----


Diff: https://reviews.apache.org/r/3265/diff


Testing
-------


Thanks,

Dmitriy


                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Attachment: MAHOUT-922-2.patch
    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Attachment: MAHOUT-922-2.patch
    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov resolved MAHOUT-922.
-------------------------------------

    Resolution: Fixed

Resolved with all addition per description. Docs are updated to reflect CLI changes.
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> -- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
> -- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (single task running time is proportional to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Attachment: MAHOUT-922.patch
    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173926#comment-13173926 ] 

Hudson commented on MAHOUT-922:
-------------------------------

Integrated in Mahout-Quality #1263 (See [https://builds.apache.org/job/Mahout-Quality/1263/])
    MAHOUT-922-2: SSVD broadcast option for B', R-hat matrices; making --reduce-tasks non-optional in SSVD CLI

dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1221603
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirValueIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java

                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Attachment: MAHOUT-922.patch

Updates from Sebastien and further memory tweaks to reduce GC thrashing in AB' job
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171799#comment-13171799 ] 

Hudson commented on MAHOUT-922:
-------------------------------

Integrated in Mahout-Quality #1258 (See [https://builds.apache.org/job/Mahout-Quality/1258/])
    MAHOUT-922:adding Distributed cache broadcast option for B'

dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220335
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java

                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Attachment:     (was: MAHOUT-922-2.patch)
    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168754#comment-13168754 ] 

Hudson commented on MAHOUT-922:
-------------------------------

Integrated in Mahout-Quality #1251 (See [https://builds.apache.org/job/Mahout-Quality/1251/])
    MAHOUT-922:optional p, AB' tweaks, faster tests, style overhaul.

dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1213842
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BBtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/DenseBlockWritable.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/QJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDPrototype.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SparseRowBlockWritable.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/UpperTriangular.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/VJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/YtYJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GivensThinSolver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GramSchmidt.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GrammSchmidt.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRFirstStep.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/QRLastStep.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDTestsHelper.java

                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov updated MAHOUT-922:
------------------------------------

    Description: 
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
-- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
-- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

  was:
Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 

AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 

So, several improvements in this patch: 
-- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
-- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
-- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 

Miscellanea: 
-- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
-- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).

Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

    
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, I/O overhead from additional passes over B' should not register. Besides, distributed cache option is now provided to efficiently address that. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> -- provided broadcast option (--broadcast, -br) to enable/disable using DistributedCache for broadcasting B' in AB' job and R-hat during QR step.
> -- forcing --reduceTasks (-t) option as non-optional. I had at least two cases when people did not set it and it is wildly important for parallelism of these jobs.
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Reopened) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov reopened MAHOUT-922:
-------------------------------------


reopening for additional patches. Patch #1: option to use distributed cache for B matrix in AB' job.
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172079#comment-13172079 ] 

Hudson commented on MAHOUT-922:
-------------------------------

Integrated in Mahout-Quality #1260 (See [https://builds.apache.org/job/Mahout-Quality/1260/])
    Revert "MAHOUT-922:adding Distributed cache broadcast option for B'"
Power iterations broken. Rolling back, more work needed.

This reverts commit 312b446690a39118879466931eab5174966ea17f.

dlyubimov : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220615
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/iterator/sequencefile/SequenceFileDirIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtDenseOutJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/ABtJob.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDCli.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDSolver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverDenseTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/math/hadoop/stochasticsvd/LocalSSVDSolverSparseSequentialTest.java

                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167329#comment-13167329 ] 

Dmitriy Lyubimov commented on MAHOUT-922:
-----------------------------------------

and oh yeah i use double[][] for blocks to ensure proper allocation and otherwise simplification. DenseMatrix contract does not guarantee specific memory allocations contracts for skinny vs. wide matrices that affect efficient memory use in this case greatly.
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy Lyubimov resolved MAHOUT-922.
-------------------------------------

    Resolution: Fixed

pushed in after Sebastian's tests.
                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172591#comment-13172591 ] 

jiraposter@reviews.apache.org commented on MAHOUT-922:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3265/
-----------------------------------------------------------

(Updated 2011-12-19 20:34:41.498102)


Review request for mahout.


Summary (updated)
-------

MAHOUT-922-2: add DistributedCache broadcast to B' files for AB' job and R-hat files for B' job, on by default, governed by -br option. 

Notes: Performance: I did not notice the difference between using distributed cache vs. opening direct streams, which is understandable since jobs are cpu-bound.
I did have to add some functionality to multifile sequence file iterators to allow for specifying multiple files coming from distributed cache which is neither glob nor directory. I also added fixes for some corner case NPEs there.

Sorry eclipse reformatting for style is a bit different from original Sean's formatting in Intellij, it is hard to adjust it exactly. 

Still cannot upload diff. tried git diff with and without --no-prefix, for parent directories /, /trunk, to no avail. Doesn't accept it. I guess i'll attach it to the jira then. 


This addresses bug MAHOUT-922.
    https://issues.apache.org/jira/browse/MAHOUT-922


Diffs
-----


Diff: https://reviews.apache.org/r/3265/diff


Testing
-------


Thanks,

Dmitriy


                
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922-2.patch, MAHOUT-922.patch, MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-922) SSVD: ABt Job tweaks for extra sparse inputs

Posted by "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167907#comment-13167907 ] 

Dmitriy Lyubimov edited comment on MAHOUT-922 at 12/12/11 10:44 PM:
--------------------------------------------------------------------

Another idea that may further relieve the cpu bound behavior a little bit is to use single precision arithmetics. As far as I understand, rounding errors in Y_i before orthonormalization do not matter much. It is Q_i*A' operation that matters. In Y_i we are just looking for a somewhat better basis than our original completely random basis.
                
      was (Author: dlyubimov):
    Another idea that may further relief cpu bound behavior a little bit is to use single precision arithmetics. As far as I understand, rounding errors in Y_i before orthonormalization do not matter much. It is Q_i*A' operation that matters. In Y_i we are just looking for a somewhat better basis than our original completely random basis.
                  
> SSVD: ABt Job tweaks for extra sparse inputs
> --------------------------------------------
>
>                 Key: MAHOUT-922
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-922
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 0.6
>
>         Attachments: MAHOUT-922.patch, MAHOUT-922.patch
>
>
> Per tests on Sebastian's extremely sparse large inputs (4.5m x 4.5 m). 
> AB' performance is still a bottleneck if one uses power iterations. For sufficiently sparse inputs it may turn out that mappers cannot form the entire blocked product in memory for Y_i. the Y_i block is going to be of size s x (k+p) where s is number of A rows read in a given split. in cases when A is extra sparse, such blocks may actually take more space than the A input. When this happens, s is constrained by -oh parameter and combiners and reducers get flooded by partial oh x (k+p) outer products and seem to have hard time to sort and shuffle them (especially high pressure on combiners has been seen). 
> So, several improvements in this patch: 
> -- present Y_i blocks as dense (they are beleived to be dense anyway, so keeping them as sparse just eats up RAM by sparse encoding, so at least twice as high blocks can actually be formed);
> -- eliminate combining completely. instead of persisting and sorting and summing up partial product in combiner, sum up map-side. if block height is still insufficient and cannot be extended due RAM constraints (unlikely for Sebastien's 4.5 x 4.5 mln case) just perform additional passes over B'. Since computation is cpu bound, additional passes over B' should not register. However, elimination of combiner phase for high load cases is probably going to have a dramatic effect.
> -- set max block height for Q'A and AB' separately instead of single -oh option. Their scaling seems to be quite different in terms of OOM danger. in my experiments Q'A blocking enters red zone at ~150,000 already whereas AB' block height can freely roam over a million easily for the same RAM. I provide 200,000 (~160Mb for k+p=100) as a default for AB' blocks which should be enough for Sebastien's 4.5 x 4.5 mln sparse case without causing more than one block. 
> Miscellanea: 
> -- Test run time: removed redundant tests and checks for SSVD. reduced test input size.
> -- Per Nathan's suggestion, p parameter is now optional, default is 15 (computation time is cubic to it, so I want to be careful not to run it too high by default).
> Current patch branch work is here: https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-922

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira