You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2010/08/13 13:22:15 UTC
[jira] Created: (MAPREDUCE-2010) [Rumen] Parallelize TraceBuilder
[Rumen] Parallelize TraceBuilder
--------------------------------
Key: MAPREDUCE-2010
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2010
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
Fix For: 0.22.0
Currently, Rumen's {{TraceBuilder}} processes jobs in sequential manner and emits them in sorted order (based on job-id). Following are the steps :
# Read data from input files
# Parse and analyze the JobHistory data
# Write the data to the output file
Steps #1 and #2 can be done in parallel. Step #3 can be made sequential (if user needs it) else can also be done in parallel.
I could achieve ~50% speedup by simply parallelizing step#1 and step#2 (i.e output was sorted based on job-id).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.