You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2010/01/09 01:16:54 UTC

[jira] Updated: (MAPREDUCE-1367) LocalJobRunner should support parallel mapper execution

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-1367:
-------------------------------------

    Attachment: MAPREDUCE-1367.patch

Attaching a patch that implements this improvement. This patch includes a test case which launches 6 mappers concurrently; these mappers run on a variety of schedules (some are faster, some are slower) in an attempt to suss out any race conditions that might develop.

The level of parallelism is controlled by a new parameter: {{mapred.local.map.tasks.maximum}}. This defaults to 1, so that unspecified behavior is as before.

I also tested this by running the 'pi' example from the command line:
{code}bin/hadoop jar hadoop-mapred-examples-0.22.0-SNAPSHOT.jar pi -D mapreduce.jobtracker.address=local -D mapreduce.local.map.tasks.maximum=2 20 5000000
{code}

With {{mapreduce.local.map.tasks.maximum}} set to 1, this takes 13.5 seconds on my machine. With it set to 2 or above (I have two cores), the runtime drops to 8.5 seconds.

> LocalJobRunner should support parallel mapper execution
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-1367
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1367
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1367.patch
>
>
> The LocalJobRunner currently supports only a single execution thread. Given the prevalence of multi-core CPUs, it makes sense to allow users to run multiple tasks in parallel for improved performance on small (local-only) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.