You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2008/05/08 07:50:55 UTC
[jira] Updated: (HADOOP-3332) improving the logging during
shuffling
[ https://issues.apache.org/jira/browse/HADOOP-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Runping Qi updated HADOOP-3332:
-------------------------------
Description:
Below is an excerpt from the log file of a reducer.
A same set of of messages about fetching schedule is logged every second.
Yet, the critical information --- which hosts were slow --- was not there.
2008-05-01 00:33:13,215 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Got 2 known map output location(s); scheduling...
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts)
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Got 2 known map output location(s); scheduling...
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts)
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
2008-05-01 00:33:16,218 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
was:
Below is an excerpt from the log file of a reducer.
A same set of of messages about fetching schedule is logged every second.
Yet, the critical information --- which hosts were slow --- was not there.
2008-05-01 00:33:13,215 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Got 2 known map output location(s); scheduling...
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts)
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Got 2 known map output location(s); scheduling...
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts)
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
2008-05-01 00:33:16,218 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
Priority: Critical (was: Major)
It turns out that this problem is much more severe that it looks initially.
For any reasonable size of jobs where the shuffling may take some time, the userlog/syslog file of each reducer task may
reach unreasonably large (0.5GB, say). This may impose a big burden for hod to harvest the log files when deallocating
a cluster. Also, if those log files are archived on a DFS (as what the hod does now), the space requirements on DFS
will be quite significant.
> improving the logging during shuffling
> --------------------------------------
>
> Key: HADOOP-3332
> URL: https://issues.apache.org/jira/browse/HADOOP-3332
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Runping Qi
> Priority: Critical
>
> Below is an excerpt from the log file of a reducer.
> A same set of of messages about fetching schedule is logged every second.
> Yet, the critical information --- which hosts were slow --- was not there.
>
> 2008-05-01 00:33:13,215 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Got 2 known map output location(s); scheduling...
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts)
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Got 2 known map output location(s); scheduling...
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts and 0 dup hosts)
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is already in progress
> 2008-05-01 00:33:16,218 INFO org.apache.hadoop.mapred.ReduceTask: task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.