You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Pete Wyckoff (JIRA)" <ji...@apache.org> on 2008/09/19 19:52:44 UTC
[jira] Created: (HADOOP-4223) Ability to throttle DFS/MR so as not
to overwhelm colo to colo switches
Ability to throttle DFS/MR so as not to overwhelm colo to colo switches
-----------------------------------------------------------------------
Key: HADOOP-4223
URL: https://issues.apache.org/jira/browse/HADOOP-4223
Project: Hadoop Core
Issue Type: Improvement
Components: dfs, mapred
Reporter: Pete Wyckoff
Motivation:
This would allow people to put data that is not used as often in non co-located HDFS instance and when needed pulling it from the other cluster.
This is useful in the context of Hive where a Metastore tells the runtime system where the data is located (the full URI) or symbolic links.
The problem:
This will not work right now because it may overwhelm switches between the two instances.
Workaround:
Make the files unplittable or make your block size such that you only get 2-3 mappers.
Possible solution:
Throttle parallelism in the scheduler by specifying to run only X mappers for a job no matter how many slots are free. (making some assumptions about the reliability of the JobTracker's failure detector).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.