You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Pete Wyckoff (JIRA)" <ji...@apache.org> on 2008/09/19 19:52:44 UTC

[jira] Created: (HADOOP-4223) Ability to throttle DFS/MR so as not to overwhelm colo to colo switches

Ability to throttle DFS/MR so as not to overwhelm colo to colo switches
-----------------------------------------------------------------------

                 Key: HADOOP-4223
                 URL: https://issues.apache.org/jira/browse/HADOOP-4223
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs, mapred
            Reporter: Pete Wyckoff


Motivation:

This would allow people to put data that is not used as often in non co-located HDFS instance and when needed pulling it from the other cluster.
This is useful in the context of Hive where a Metastore tells the runtime system where the data is located (the full URI) or symbolic links.

The problem:

This will not work right now because it may overwhelm switches between the two instances.  

Workaround:

Make the files unplittable or make your block size such that you only get 2-3 mappers.

Possible solution:

Throttle parallelism in the scheduler by specifying to run only X mappers for a job no matter how many slots are free. (making some assumptions about the reliability of the JobTracker's failure detector).





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.