You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2009/04/14 19:29:15 UTC

[jira] Created: (CHUKWA-132) Handle multiline output in Job History file

Handle multiline output in Job History file
-------------------------------------------

                 Key: CHUKWA-132
                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
             Project: Hadoop Chukwa
          Issue Type: Bug
          Components: Data Processors
         Environment: Redhat EL 5.1, Java 6
            Reporter: Eric Yang
            Priority: Critical


When there are multi line output in the Job History file, the parser fails with exception like this:

MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)

	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
" .
MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .

[cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
	at java.lang.String.substring(String.java:1938)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

[csource] :kry-jt1.red.ygrid.yahoo.com
[ctags] :cluster="kryptonitered"



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Cheng (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698963#action_12698963 ] 

Cheng commented on CHUKWA-132:
------------------------------

Current Hadoop job history parser uses the same algorithm to get multiple-lines log. For the detail, please refer JobHistory.parseHistoryFromFS

> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :host.example.com
> [ctags] :cluster="demo"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698973#action_12698973 ] 

Ari Rabkin commented on CHUKWA-132:
-----------------------------------

It should be feasible to do this in a custom adapter class; we already have a small coterie of subclasses of FileTailingAdaptor, precisely to allow different policies about where to break chunks.

> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :host.example.com
> [ctags] :cluster="demo"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang reassigned CHUKWA-132:
--------------------------------

    Assignee: Cheng

> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Critical
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :kry-jt1.red.ygrid.yahoo.com
> [ctags] :cluster="kryptonitered"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Cheng (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheng updated CHUKWA-132:
-------------------------

    Status: Patch Available  (was: Open)

 Job logs could be split into multiple lines.  New code monitors input lines. If input recordEntry doesn't end with '"' or '" .', save the log and wait for the next log. Otherwise process the full log. 


> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :kry-jt1.red.ygrid.yahoo.com
> [ctags] :cluster="kryptonitered"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698939#action_12698939 ] 

Jerome Boulon commented on CHUKWA-132:
--------------------------------------

Neither saving the log, nor process full log (Logs can be more than 100MB) are valid option.
Hadoop contains a parser for JobHistory, what are they doing? Similar code could be applied on Chukwa side at the adaptor level.


> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :kry-jt1.red.ygrid.yahoo.com
> [ctags] :cluster="kryptonitered"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-132:
-----------------------------

    Description: 
When there are multi line output in the Job History file, the parser fails with exception like this:

MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)

	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
" .
MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .

[cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
	at java.lang.String.substring(String.java:1938)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

[csource] :host.example.com
[ctags] :cluster="demo"



  was:
When there are multi line output in the Job History file, the parser fails with exception like this:

MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)

	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
" .
MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .

[cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
	at java.lang.String.substring(String.java:1938)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

[csource] :kry-jt1.red.ygrid.yahoo.com
[ctags] :cluster="kryptonitered"




> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :host.example.com
> [ctags] :cluster="demo"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated CHUKWA-132:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this, thanks Cheng.

> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :host.example.com
> [ctags] :cluster="demo"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Mac Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mac Yang updated CHUKWA-132:
----------------------------

    Priority: Blocker  (was: Critical)

This is a blocker for 0.1.2.

> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :kry-jt1.red.ygrid.yahoo.com
> [ctags] :cluster="kryptonitered"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-132) Handle multiline output in Job History file

Posted by "Cheng (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheng updated CHUKWA-132:
-------------------------

    Attachment: chukwa-132.patch

> Handle multiline output in Job History file
> -------------------------------------------
>
>                 Key: CHUKWA-132
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-132
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: Data Processors
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Assignee: Cheng
>            Priority: Blocker
>         Attachments: chukwa-132.patch
>
>
> When there are multi line output in the Job History file, the parser fails with exception like this:
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
> 	at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)
> 	at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
> 	at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
> 	at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
> 	at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> 	at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
> " .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
> Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .
> [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1938)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
> 	at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
> 	at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> [csource] :kry-jt1.red.ygrid.yahoo.com
> [ctags] :cluster="kryptonitered"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.