You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2009/05/20 19:29:45 UTC

[jira] Created: (HADOOP-5876) Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs

Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
-------------------------------------------------------------------------

                 Key: HADOOP-5876
                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.20.0
         Environment: Redhat EL 5.1, Java 6
            Reporter: Eric Yang
            Priority: Critical


Shuffling information is currently logged to userlogs, it would be ideal to have this information consolidated in task tracker log or job history log file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5876) Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs

Posted by "Jiaqi Tan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711249#action_12711249 ] 

Jiaqi Tan edited comment on HADOOP-5876 at 5/20/09 10:36 AM:
-------------------------------------------------------------

Updated to include older releases which also have shuffles and other related Map- and Reduce-specific system information that are currently being logged to userlogs/attempt_###_r_###/syslog. If the logging of this information could be moved out to the main TaskTracker log then this would facilitate the information being picked up by Chukwa.

      was (Author: tanjiaqi):
    Updated to include older releases which also have shuffles and other related Map- and Reduce-specific system information that are currently being logged to userlogs/attempt_###_r_###/syslog.
  
> Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Priority: Critical
>
> Shuffling information is currently logged to userlogs, it would be ideal to have this information consolidated in task tracker log or job history log file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5876) Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711277#action_12711277 ] 

Arun C Murthy commented on HADOOP-5876:
---------------------------------------

What kind of shuffle information are you looking for? 

Overall it doesn't make sense to start pushing task-logs to the tasktracker logs - it's just easier to collect userlogs. Even more, it makes absolutely no sense to push it to job-history, seems to me that we should be using some metrics at best.

> Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Priority: Critical
>
> Shuffling information is currently logged to userlogs, it would be ideal to have this information consolidated in task tracker log or job history log file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5876) Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs

Posted by "Jiaqi Tan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jiaqi Tan updated HADOOP-5876:
------------------------------

    Affects Version/s: 0.18.0
                       0.18.1
                       0.18.2
                       0.18.3
                       0.19.0
                       0.19.1

Updated to include older releases which also have shuffles and other related Map- and Reduce-specific system information that are currently being logged to userlogs/attempt_###_r_###/syslog.

> Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Priority: Critical
>
> Shuffling information is currently logged to userlogs, it would be ideal to have this information consolidated in task tracker log or job history log file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5876) Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711384#action_12711384 ] 

Eric Yang commented on HADOOP-5876:
-----------------------------------

Although shuffling information is available in hadoop metrics form.  It doesn't contain the full information.  For example:

Timestamp : 1242815580000
[jobId] :job_200905200055_0930
[jobName] :Chukwa-Demux_20090520_10_32
[recordName] :shuffleInput
[sessionId] :
[shuffle_failed_fetches] :0
[shuffle_fetchers_busy_percent] :0
[shuffle_input_bytes] :10
[shuffle_success_fetches] :5
[taskId] :attempt_200905200055_0930_r_000006_0

The task attempt id doesn't indicate the source of the shuffle.  It is difficult to match the corresponding shuffle input and output.  In addition, the start time and end time of the suffle would also be useful.  What is the easiest way to get this information into Chukwa?

> Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Priority: Critical
>
> Shuffling information is currently logged to userlogs, it would be ideal to have this information consolidated in task tracker log or job history log file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-5876) Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-5876.
-----------------------------------

    Resolution: Won't Fix

I don't see any way to push the logging to the task tracker. The processing is done by the task. You can grab the shuffle bytes from the counters.

> Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Priority: Critical
>
> Shuffling information is currently logged to userlogs, it would be ideal to have this information consolidated in task tracker log or job history log file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.