You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/10/23 19:41:50 UTC
[jira] Created: (HADOOP-2093) DFS should provide partition
information for blocks, and map/reduce should schedule avoid schedule
mappers with the splits off the same file system partition at the same time
DFS should provide partition information for blocks, and map/reduce should schedule avoid schedule mappers with the splits off the same file system partition at the same time
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: HADOOP-2093
URL: https://issues.apache.org/jira/browse/HADOOP-2093
Project: Hadoop
Issue Type: New Feature
Reporter: Runping Qi
The summary is a bit of long. But the basic idea is to better utilize multiple file system partitions.
For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits spread
across 4 file system partitions, if we allow 4 mappers running concurrently, it is better that mappers
each work on splits on different file system partitions. If in the worst case,
all the mappers work on the splits on the same file system partition, then the other three
file systems are not utilized at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2093) DFS should provide partition
information for blocks, and map/reduce should schedule avoid schedule
mappers with the splits off the same file system partition at the same time
Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539753 ]
eric baldeschwieler commented on HADOOP-2093:
---------------------------------------------
An easier solution might simply be to schedule more blocks to be read at once. This will saturate the disk system with less complexity...
> DFS should provide partition information for blocks, and map/reduce should schedule avoid schedule mappers with the splits off the same file system partition at the same time
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2093
> URL: https://issues.apache.org/jira/browse/HADOOP-2093
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Runping Qi
>
> The summary is a bit of long. But the basic idea is to better utilize multiple file system partitions.
> For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits spread
> across 4 file system partitions, if we allow 4 mappers running concurrently, it is better that mappers
> each work on splits on different file system partitions. If in the worst case,
> all the mappers work on the splits on the same file system partition, then the other three
> file systems are not utilized at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2093) DFS should provide partition
information for blocks, and map/reduce should schedule avoid schedule
mappers with the splits off the same file system partition at the same time
Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Runping Qi updated HADOOP-2093:
-------------------------------
Component/s: mapred
dfs
Description:
The summary is a bit of long. But the basic idea is to better utilize multiple file system partitions.
For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits spread
across 4 file system partitions, if we allow 4 mappers running concurrently, it is better that mappers
each work on splits on different file system partitions. If in the worst case,
all the mappers work on the splits on the same file system partition, then the other three
file systems are not utilized at all.
was:
The summary is a bit of long. But the basic idea is to better utilize multiple file system partitions.
For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits spread
across 4 file system partitions, if we allow 4 mappers running concurrently, it is better that mappers
each work on splits on different file system partitions. If in the worst case,
all the mappers work on the splits on the same file system partition, then the other three
file systems are not utilized at all.
> DFS should provide partition information for blocks, and map/reduce should schedule avoid schedule mappers with the splits off the same file system partition at the same time
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2093
> URL: https://issues.apache.org/jira/browse/HADOOP-2093
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: Runping Qi
>
> The summary is a bit of long. But the basic idea is to better utilize multiple file system partitions.
> For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits spread
> across 4 file system partitions, if we allow 4 mappers running concurrently, it is better that mappers
> each work on splits on different file system partitions. If in the worst case,
> all the mappers work on the splits on the same file system partition, then the other three
> file systems are not utilized at all.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.