You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2013/07/31 17:21:50 UTC

[jira] [Commented] (MAPREDUCE-5395) Update Teragen algorithm

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725338#comment-13725338 ] 

Steve Loughran commented on MAPREDUCE-5395:
-------------------------------------------

If the Terasort stuff is being updated, could the various mappers, reducers &c all be made public so you that you can glue them together in different ways. I recall finding this hard to do in some past work, and having to copy & paste to my own source tree
                
> Update Teragen algorithm
> ------------------------
>
>                 Key: MAPREDUCE-5395
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5395
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 0.23.7
>            Reporter: Thomas Graves
>
> The Teragen algorithm is no longer up to date with the sortbenchmark.org gensort tool used for the official sort benchmark.  The new algorithm is supposed to generate data that isn't very compressible. 
> Also the new version of gensort can generate skewed data so we should add that option to teragen also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira