You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Peeyush Bishnoi (JIRA)" <ji...@apache.org> on 2008/10/07 11:21:44 UTC

[jira] Created: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Hadoop Streaming failed with large number of input files
--------------------------------------------------------

                 Key: HADOOP-4362
                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.18.1
            Reporter: Peeyush Bishnoi
            Priority: Critical
             Fix For: 0.18.2


Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.

$  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar49637/] []
/tmp/streamjob49638.jar tmpDir=/tmp
08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException

Streaming Job Failed!


But when the input number of files are less job does not fail . 

$ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar3725/] []
/tmp/streamjob3726.jar tmpDir=/tmp
08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
[/var/mapred/local]
08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data2




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Posted by "Peeyush Bishnoi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peeyush Bishnoi updated HADOOP-4362:
------------------------------------

    Description: 
Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.

$  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar49637/] []
/tmp/streamjob49638.jar tmpDir=/tmp
08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException

Streaming Job Failed!


But when the input number of files are less job does not fail . 

$ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar3725/] []
/tmp/streamjob3726.jar tmpDir=/tmp
08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
[/var/mapred/local]
08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1




  was:
Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.

$  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar49637/] []
/tmp/streamjob49638.jar tmpDir=/tmp
08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException

Streaming Job Failed!


But when the input number of files are less job does not fail . 

$ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-unjar3725/] []
/tmp/streamjob3726.jar tmpDir=/tmp
08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
[/var/mapred/local]
08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data2





> Hadoop Streaming failed with large number of input files
> --------------------------------------------------------
>
>                 Key: HADOOP-4362
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.18.1
>            Reporter: Peeyush Bishnoi
>            Priority: Critical
>             Fix For: 0.18.2
>
>
> Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.
> $  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar49637/] []
> /tmp/streamjob49638.jar tmpDir=/tmp
> 08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException
> Streaming Job Failed!
> But when the input number of files are less job does not fail . 
> $ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar3725/] []
> /tmp/streamjob3726.jar tmpDir=/tmp
> 08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
> [/var/mapred/local]
> 08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
> 08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
> 08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
> 08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
> 08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
> 08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
> 08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
> 08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
> 08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler resolved HADOOP-4362.
-------------------------------------

    Resolution: Duplicate

There seems to be general agreement that these issues are related.

> Hadoop Streaming failed with large number of input files
> --------------------------------------------------------
>
>                 Key: HADOOP-4362
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Peeyush Bishnoi
>            Assignee: Robert Chansler
>            Priority: Blocker
>             Fix For: 0.18.2, 0.19.0
>
>
> Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.
> $  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar49637/] []
> /tmp/streamjob49638.jar tmpDir=/tmp
> 08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException
> Streaming Job Failed!
> But when the input number of files are less job does not fail . 
> $ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar3725/] []
> /tmp/streamjob3726.jar tmpDir=/tmp
> 08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
> [/var/mapred/local]
> 08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
> 08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
> 08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
> 08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
> 08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
> 08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
> 08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
> 08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
> 08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4362:
--------------------------------

    Assignee: Robert Chansler  (was: Devaraj Das)

Rob, could you please take care of this issue? Thanks!

> Hadoop Streaming failed with large number of input files
> --------------------------------------------------------
>
>                 Key: HADOOP-4362
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Peeyush Bishnoi
>            Assignee: Robert Chansler
>            Priority: Blocker
>             Fix For: 0.18.2, 0.19.0
>
>
> Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.
> $  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar49637/] []
> /tmp/streamjob49638.jar tmpDir=/tmp
> 08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException
> Streaming Job Failed!
> But when the input number of files are less job does not fail . 
> $ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar3725/] []
> /tmp/streamjob3726.jar tmpDir=/tmp
> 08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
> [/var/mapred/local]
> 08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
> 08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
> 08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
> 08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
> 08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
> 08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
> 08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
> 08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
> 08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4362:
--------------------------------

         Priority: Blocker  (was: Critical)
    Fix Version/s: 0.19.0
         Assignee: Devaraj Das

Marking it a blocker until we get to the root cause.

> Hadoop Streaming failed with large number of input files
> --------------------------------------------------------
>
>                 Key: HADOOP-4362
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.18.1
>            Reporter: Peeyush Bishnoi
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.18.2, 0.19.0
>
>
> Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.
> $  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar49637/] []
> /tmp/streamjob49638.jar tmpDir=/tmp
> 08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException
> Streaming Job Failed!
> But when the input number of files are less job does not fail . 
> $ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar3725/] []
> /tmp/streamjob3726.jar tmpDir=/tmp
> 08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
> [/var/mapred/local]
> 08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
> 08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
> 08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
> 08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
> 08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
> 08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
> 08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
> 08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
> 08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638008#action_12638008 ] 

Robert Chansler commented on HADOOP-4362:
-----------------------------------------

Yes! The good news this morning is that Hairong now has a good understanding
of the issue.






> Hadoop Streaming failed with large number of input files
> --------------------------------------------------------
>
>                 Key: HADOOP-4362
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Peeyush Bishnoi
>            Assignee: Robert Chansler
>            Priority: Blocker
>             Fix For: 0.18.2, 0.19.0
>
>
> Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.
> $  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar49637/] []
> /tmp/streamjob49638.jar tmpDir=/tmp
> 08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException
> Streaming Job Failed!
> But when the input number of files are less job does not fail . 
> $ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar3725/] []
> /tmp/streamjob3726.jar tmpDir=/tmp
> 08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
> [/var/mapred/local]
> 08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
> 08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
> 08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
> 08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
> 08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
> 08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
> 08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
> 08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
> 08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4362) Hadoop Streaming failed with large number of input files

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4362:
--------------------------------

    Component/s:     (was: contrib/streaming)
                 dfs

Peeyush offline got this stack trace when he tried to run wordcount. Looks like a DFS issue - maybe HADOOP-4351. If dfs folks think so, please mark this bug as a duplicate.

hadoop jar hadoop-examples.jar wordcount -m 8 -r 2
/foo/bar/part-*/user/foo/testop1 
08/10/07 16:27:14 INFO mapred.FileInputFormat: Total input paths to process :
16349
08/10/07 16:27:15 INFO mapred.FileInputFormat: Total input paths to process :
16349
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException

        at org.apache.hadoop.ipc.Client.call(Client.java:715)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:297)
        at
org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:318)
        at
org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:137)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:241)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
        at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

> Hadoop Streaming failed with large number of input files
> --------------------------------------------------------
>
>                 Key: HADOOP-4362
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4362
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>            Reporter: Peeyush Bishnoi
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.18.2, 0.19.0
>
>
> Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.
> $  hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar49637/] []
> /tmp/streamjob49638.jar tmpDir=/tmp
> 08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
> 08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException
> Streaming Job Failed!
> But when the input number of files are less job does not fail . 
> $ hadoop  jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-unjar3725/] []
> /tmp/streamjob3726.jar tmpDir=/tmp
> 08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
> 08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
> [/var/mapred/local]
> 08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
> 08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
> 08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
> 08/10/07 07:06:43 INFO streaming.StreamJob:  map 0%  reduce 0%
> 08/10/07 07:06:46 INFO streaming.StreamJob:  map 44%  reduce 0%
> 08/10/07 07:06:47 INFO streaming.StreamJob:  map 75%  reduce 0%
> 08/10/07 07:06:48 INFO streaming.StreamJob:  map 88%  reduce 0%
> 08/10/07 07:06:49 INFO streaming.StreamJob:  map 100%  reduce 100%
> 08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
> 08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.