You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2009/05/14 18:38:45 UTC

[jira] Created: (HADOOP-5834) Job History log file format is not friendly for external tools.

Job History log file format is not friendly for external tools.
---------------------------------------------------------------

                 Key: HADOOP-5834
                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Owen O'Malley


Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711429#action_12711429 ] 

Philip Zeyliger commented on HADOOP-5834:
-----------------------------------------

I'm +1 on JSON.  It might make sense to introduce a hook to have multiple sinks for this sort of log data.  I'd like to siphon it off some of the data to a database, and I could also see some folks wishing to compress these logs.

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-5834:
-------------------------------------

    Assignee: Amar Kamat

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711411#action_12711411 ] 

Owen O'Malley edited comment on HADOOP-5834 at 5/20/09 4:23 PM:
----------------------------------------------------------------

It would look like:

{code}
{"KIND":"MapAttempt",
 "TASK_TYPE":"MAP",
 "TASKID":"task_200904210931_0001_m_180280",
 "TASK_ATTEMPT_ID":"attempt_200904210931_0001_m_180280_0",
 "TASK_STATUS":"SUCCESS",
 "FINISH_TIME":"1240321545820",
 "HOSTNAME":"/rack1/node1.purple.ygrid.yahoo.com",
 "STATE_STRING":"",
 "COUNTERS":[{"ID":"org.apache.hadoop.mapred.Task$Counter", "NAME":"Map-Reduce Framework"
                           "LIST":[{"ID":"COMBINE_OUTPUT_RECORDS", "NAME":"Combine output records", "VALUE":0},
                                    {"ID":"MAP_INPUT_RECORDS","NAME":"Map input records", "VALUE":3363235}]}]
}
{code}

To support grep, we shouldn't add any newlines.

      was (Author: owen.omalley):
    It would look like:

{code}
{"KIND":"MapAttempt",
 "TASK_TYPE":"MAP",
 "TASKID":"task_200904210931_0001_m_180280",
 "TASK_ATTEMPT_ID":"attempt_200904210931_0001_m_180280_0",
 "TASK_STATUS":"SUCCESS",
 "FINISH_TIME":"1240321545820",
 "HOSTNAME":"/rack1/node1.purple.ygrid.yahoo.com",
 "STATE_STRING":"",
 "COUNTERS":[{"ID":"org.apache.hadoop.mapred.Task$Counter", "NAME":"Map-Reduce Framework"
                           LIST:[{"ID":"COMBINE_OUTPUT_RECORDS", "NAME":"Combine output records", "VALUE":0},
                                    {"ID":"MAP_INPUT_RECORDS","NAME":"Map input records", "VALUE":3363235}]}]
}
{code}
  
> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711489#action_12711489 ] 

Owen O'Malley commented on HADOOP-5834:
---------------------------------------

Would adding a log4j logger than was only used for these events work for you? That is certainly something I'd be interested in seeing. Because these event logs are used for JobTracker restart, I don't think making it pluggable is a good plan. 

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709681#action_12709681 ] 

Owen O'Malley commented on HADOOP-5834:
---------------------------------------

The current format quotes the character set ["\.]. The format is 

key kind1="val1" kind2="val2"\n

with newlines in the values converted to '\n' and quotes converted to '\"'. We should also consider how to represent counters better. The current format is pretty painful.

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711477#action_12711477 ] 

Todd Lipcon commented on HADOOP-5834:
-------------------------------------

Along the lines of what Philip said, I'd like to propose using HADOOP-5640 to expose the hooks for this job-history logging system. This would allow people to log in whichever format they prefer (eg JDBC to SQL, JSON, Thrift to something custom, etc) from an external JAR (without modifying mapred source)

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709755#action_12709755 ] 

Amar Kamat commented on HADOOP-5834:
------------------------------------

I think for now we can do something like this 
{code}
# encoding step
1. escape all ';' in values
2. replace '\n' with ';'

# decoding step
1. replace all unescaped ';' with \n
2. unescape all ';'
{code}

For now this should work. Externally the error message would look like ';' separated values and one line in jobhistory will contain exactly one event. Thoughts?

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711369#action_12711369 ] 

Owen O'Malley commented on HADOOP-5834:
---------------------------------------

Doug suggests using Jackson library for JSON parsing and generation. Its url is http://jackson.codehaus.org/.

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711296#action_12711296 ] 

Owen O'Malley commented on HADOOP-5834:
---------------------------------------

I think we should completely redesign the format. I'd propose using JSON so that it is trivial to parse in python, perl and java. If we only put in newlines, between records all of the needs are met using a standard layout. Furthermore, we can encode counters simply and directly rather than complicated nested encoding schemes.

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711752#action_12711752 ] 

Philip Zeyliger commented on HADOOP-5834:
-----------------------------------------

Ah, right, JobTracker.RecoveryManager seems to use the filenames for the logs to restart jobs.

-- Philip

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711737#action_12711737 ] 

Philip Zeyliger commented on HADOOP-5834:
-----------------------------------------

What do you mean by "event logs are used for JobTracker restart"?

I'd like to get at these objects in a structured way; i.e., I don't want to parse them from a String.  Will a log4j logger let me do that?

-- Philip

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5834) Job History log file format is not friendly for external tools.

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711411#action_12711411 ] 

Owen O'Malley commented on HADOOP-5834:
---------------------------------------

It would look like:

{code}
{"KIND":"MapAttempt",
 "TASK_TYPE":"MAP",
 "TASKID":"task_200904210931_0001_m_180280",
 "TASK_ATTEMPT_ID":"attempt_200904210931_0001_m_180280_0",
 "TASK_STATUS":"SUCCESS",
 "FINISH_TIME":"1240321545820",
 "HOSTNAME":"/rack1/node1.purple.ygrid.yahoo.com",
 "STATE_STRING":"",
 "COUNTERS":[{"ID":"org.apache.hadoop.mapred.Task$Counter", "NAME":"Map-Reduce Framework"
                           LIST:[{"ID":"COMBINE_OUTPUT_RECORDS", "NAME":"Combine output records", "VALUE":0},
                                    {"ID":"MAP_INPUT_RECORDS","NAME":"Map input records", "VALUE":3363235}]}]
}
{code}

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5834
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5834
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.