You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron T. Myers (JIRA)" <ji...@apache.org> on 2011/08/18 03:51:27 UTC
[jira] [Commented] (MAPREDUCE-2853) Add "teraread" example
[ https://issues.apache.org/jira/browse/MAPREDUCE-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086735#comment-13086735 ]
Aaron T. Myers commented on MAPREDUCE-2853:
-------------------------------------------
Patch looks pretty good, Todd. The only thing I notice is that it looks like there are a few unused imports in the file. +1 pending removal of those.
> Add "teraread" example
> ----------------------
>
> Key: MAPREDUCE-2853
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2853
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: benchmarks, examples
> Affects Versions: 0.23.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: mapreduce-2853.txt
>
>
> Teragen is a good benchmark of raw DFS write throughput. Terasort is a good benchmark of the whole MR system (input, shuffle, output). I've added a simple "teraread" example which reads through the terasort input data without performing any processing: this acts as a good benchmark of a read-only workload (similar to real-life "find a needle in a haystack" MR jobs)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira