You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Devaraj K (JIRA)" <ji...@apache.org> on 2011/02/03 16:58:29 UTC

[jira] Created: (MAPREDUCE-2297) All map reduce tasks are failing if we give invalid path jar file for Job

All map reduce tasks are failing if we give invalid path jar file for Job
-------------------------------------------------------------------------

                 Key: MAPREDUCE-2297
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2297
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: tasktracker
    Affects Versions: 0.20.2
            Reporter: Devaraj K
            Priority: Minor


This can be reproduced by giving the invalid jar file for the Job or it can be reproduced from hive.


In hive-default.xml

<property>
<name>hive.aux.jars.path</name>
<value></value>
<description>Provided for adding auxillaryjarsPath</description>
</property>

If we configure an invalid path for jar file, It is making all map reduce tasks to fail even those jobs are not depending on this jar file and it is giving the below exception.
{code:xml} 
hive> select * from a join b on(a.b=b.c);
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.io.FileNotFoundException: File does not exist: /user/root/grade.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:495)
at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:651)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:783)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:698)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)

{code} 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2297) All map reduce tasks are failing if we give invalid path jar file for Job

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038433#comment-13038433 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2297:
----------------------------------------------------

I too contest this. We don't have a concept of optional artifacts today.

Also, I am a little confused about your report. The exception trace is showing that job-submission itself is failing. But you mention 'all tasks are failing'.

It looks like hive.aux.jars.path should really be a job specific parameter. Otherwise if a new query(?) needs a new jar, it would mean a change in a system config? Seems weird, can you throw some more light on how this configuration is intended to be used.

If the suggestion is to just accept the job removing all invalid entries, but still keep the list of valid jars consistent through the rest of system, that should be okay I guess.

> All map reduce tasks are failing if we give invalid path jar file for Job
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2297
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2297
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.2
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Minor
>             Fix For: 0.20.4
>
>         Attachments: MAPREDUCE-2297.patch
>
>
> This can be reproduced by giving the invalid jar file for the Job or it can be reproduced from hive.
> In hive-default.xml
> <property>
> <name>hive.aux.jars.path</name>
> <value></value>
> <description>Provided for adding auxillaryjarsPath</description>
> </property>
> If we configure an invalid path for jar file, It is making all map reduce tasks to fail even those jobs are not depending on this jar file and it is giving the below exception.
> {code:xml} 
> hive> select * from a join b on(a.b=b.c);
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> java.io.FileNotFoundException: File does not exist: /user/root/grade.jar
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:495)
> at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509)
> at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:651)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:783)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:698)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2297) All map reduce tasks are failing if we give invalid path jar file for Job

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-2297:
-----------------------------------

    Status: Open  (was: Patch Available)

cancelling patch for discussion

> All map reduce tasks are failing if we give invalid path jar file for Job
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2297
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2297
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.2
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Minor
>             Fix For: 0.20.4
>
>         Attachments: MAPREDUCE-2297.patch
>
>
> This can be reproduced by giving the invalid jar file for the Job or it can be reproduced from hive.
> In hive-default.xml
> <property>
> <name>hive.aux.jars.path</name>
> <value></value>
> <description>Provided for adding auxillaryjarsPath</description>
> </property>
> If we configure an invalid path for jar file, It is making all map reduce tasks to fail even those jobs are not depending on this jar file and it is giving the below exception.
> {code:xml} 
> hive> select * from a join b on(a.b=b.c);
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> java.io.FileNotFoundException: File does not exist: /user/root/grade.jar
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:495)
> at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509)
> at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:651)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:783)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:698)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2297) All map reduce tasks are failing if we give invalid path jar file for Job

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039228#comment-13039228 ] 

Todd Lipcon commented on MAPREDUCE-2297:
----------------------------------------

hive.aux.jars.path can be set per-query with hive using the "set" command from the hive shell. I would not consider it system-level... or, if it's system level, it should be pointing to jars that are guaranteed to exist.

> All map reduce tasks are failing if we give invalid path jar file for Job
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2297
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2297
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.2
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Minor
>             Fix For: 0.20.4
>
>         Attachments: MAPREDUCE-2297.patch
>
>
> This can be reproduced by giving the invalid jar file for the Job or it can be reproduced from hive.
> In hive-default.xml
> <property>
> <name>hive.aux.jars.path</name>
> <value></value>
> <description>Provided for adding auxillaryjarsPath</description>
> </property>
> If we configure an invalid path for jar file, It is making all map reduce tasks to fail even those jobs are not depending on this jar file and it is giving the below exception.
> {code:xml} 
> hive> select * from a join b on(a.b=b.c);
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> java.io.FileNotFoundException: File does not exist: /user/root/grade.jar
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:495)
> at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509)
> at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:651)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:783)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:698)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2297) All map reduce tasks are failing if we give invalid path jar file for Job

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037013#comment-13037013 ] 

Todd Lipcon commented on MAPREDUCE-2297:
----------------------------------------

bq. Consider the case of Hive, if we configure any invalid path for the property "hive.aux.jars.path" all the jobs will fail which is not using that jar also.

I would consider it a broken config if you configure hive.aux.jars.path to point to a jar which doesn't exist.

In your patch, if you accidentally make a typo in your DistributedCache entries, you'll see NoClassDefFound exceptions or other much scarier errors. I think it's better to fail with the "File does not exist" error during localization.

> All map reduce tasks are failing if we give invalid path jar file for Job
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2297
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2297
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.2
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Minor
>             Fix For: 0.20.4
>
>         Attachments: MAPREDUCE-2297.patch
>
>
> This can be reproduced by giving the invalid jar file for the Job or it can be reproduced from hive.
> In hive-default.xml
> <property>
> <name>hive.aux.jars.path</name>
> <value></value>
> <description>Provided for adding auxillaryjarsPath</description>
> </property>
> If we configure an invalid path for jar file, It is making all map reduce tasks to fail even those jobs are not depending on this jar file and it is giving the below exception.
> {code:xml} 
> hive> select * from a join b on(a.b=b.c);
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> java.io.FileNotFoundException: File does not exist: /user/root/grade.jar
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:495)
> at org.apache.hadoop.filecache.DistributedCache.getTimestamp(DistributedCache.java:509)
> at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:651)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:783)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:698)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira