You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dmitriy Lyubimov <dl...@apache.org> on 2011/12/19 21:31:03 UTC

Review Request: MAHOUT-922-2: add DistributedCache broadcast to B' files for AB' job and R-hat files for B' job

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3265/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

MAHOUT-922-2: add DistributedCache broadcast to B' files for AB' job and R-hat files for B' job, on by default, governed by -br option. 

Notes: Performance: I did not notice the difference between using distributed cache vs. opening direct streams, which is understandable since jobs are cpu-bound.
I did have to add some functionality to multifile sequence file iterators to allow for specifying multiple files coming from distributed cache which is neither glob nor directory. I also added fixes for some corner case NPEs there.

Sorry eclipse reformatting for style is a bit different from original Sean's formatting in Intellij, it is hard to adjust it exactly. 


This addresses bug MAHOUT-922.
    https://issues.apache.org/jira/browse/MAHOUT-922


Diffs
-----


Diff: https://reviews.apache.org/r/3265/diff


Testing
-------


Thanks,

Dmitriy


Re: Review Request: MAHOUT-922-2: add DistributedCache broadcast to B' files for AB' job and R-hat files for B' job

Posted by Dmitriy Lyubimov <dl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3265/
-----------------------------------------------------------

(Updated 2011-12-19 20:34:41.498102)


Review request for mahout.


Summary (updated)
-------

MAHOUT-922-2: add DistributedCache broadcast to B' files for AB' job and R-hat files for B' job, on by default, governed by -br option. 

Notes: Performance: I did not notice the difference between using distributed cache vs. opening direct streams, which is understandable since jobs are cpu-bound.
I did have to add some functionality to multifile sequence file iterators to allow for specifying multiple files coming from distributed cache which is neither glob nor directory. I also added fixes for some corner case NPEs there.

Sorry eclipse reformatting for style is a bit different from original Sean's formatting in Intellij, it is hard to adjust it exactly. 

Still cannot upload diff. tried git diff with and without --no-prefix, for parent directories /, /trunk, to no avail. Doesn't accept it. I guess i'll attach it to the jira then. 


This addresses bug MAHOUT-922.
    https://issues.apache.org/jira/browse/MAHOUT-922


Diffs
-----


Diff: https://reviews.apache.org/r/3265/diff


Testing
-------


Thanks,

Dmitriy