You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Rajiv Chittajallu (JIRA)" <ji...@apache.org> on 2009/03/04 21:29:56 UTC
[jira] Created: (HADOOP-5400) JT restart recovery: Exclude jobs
which failed during SUBMIT_JOB (due to acl)
JT restart recovery: Exclude jobs which failed during SUBMIT_JOB (due to acl)
-------------------------------------------------------------------------------
Key: HADOOP-5400
URL: https://issues.apache.org/jira/browse/HADOOP-5400
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Environment: Hadoop 0.20 + 0.20.0 + HADOOP-5225 + HADOOP-5332
Reporter: Rajiv Chittajallu
Priority: Blocker
mapred.jobtracker.restart.recover is set to true in mapred-site.xml
This is a job that failed during Job submit due to invalid ACL
2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: error: org.apache.hadoop.security.AccessControlException: User rajive cannot perform operation SUBMIT_JOB on queue default
When the JobTracker was restarted after some time, the failed job was being recovered/restarted
2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_200903041852_0040. Deleting it!!
2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to recover job job_200903041223_0259
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker failed to recover job job_200903041223_0259. Ignoring it.
java.io.FileNotFoundException: File file:/grid/0/hadoop/var/log/history/axonitegold-jt1.gold.ygrid.yahoo.com_1236192735577_job_200903041223_0259_rajive_word+count does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count for job job_200903041223_0259 is 0
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200903041223_0259 = 4664646202464
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_200903041223_0259 with 34640 splits:
These jobs failed during job submit shouldn't be considered for recovery.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5400) JT restart recovery: Exclude jobs
which failed during SUBMIT_JOB (due to acl)
Posted by "Rajiv Chittajallu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajiv Chittajallu updated HADOOP-5400:
--------------------------------------
Description:
mapred.jobtracker.restart.recover is set to true in mapred-site.xml
This is a job that failed during Job submit due to invalid ACL
2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: error: org.apache.hadoop.security.AccessControlException: User rajive cannot perform operation SUBMIT_JOB on queue default
When the JobTracker was restarted after some time, the failed job was being recovered/restarted
2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_200903041852_0040. Deleting it!!
2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to recover job job_200903041223_0259
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker failed to recover job job_200903041223_0259. Ignoring it.
java.io.FileNotFoundException: File file:/var/log/hadoop//history/jobtracker1.foo.com_1236192735577_job_200903041223_0259_rajive_word+count does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count for job job_200903041223_0259 is 0
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200903041223_0259 = 4664646202464
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_200903041223_0259 with 34640 splits:
These jobs failed during job submit shouldn't be considered for recovery.
was:
mapred.jobtracker.restart.recover is set to true in mapred-site.xml
This is a job that failed during Job submit due to invalid ACL
2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: error: org.apache.hadoop.security.AccessControlException: User rajive cannot perform operation SUBMIT_JOB on queue default
When the JobTracker was restarted after some time, the failed job was being recovered/restarted
2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_200903041852_0040. Deleting it!!
2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to recover job job_200903041223_0259
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker failed to recover job job_200903041223_0259. Ignoring it.
java.io.FileNotFoundException: File file:/grid/0/hadoop/var/log/history/axonitegold-jt1.gold.ygrid.yahoo.com_1236192735577_job_200903041223_0259_rajive_word+count does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count for job job_200903041223_0259 is 0
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200903041223_0259 = 4664646202464
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_200903041223_0259 with 34640 splits:
These jobs failed during job submit shouldn't be considered for recovery.
> JT restart recovery: Exclude jobs which failed during SUBMIT_JOB (due to acl)
> -------------------------------------------------------------------------------
>
> Key: HADOOP-5400
> URL: https://issues.apache.org/jira/browse/HADOOP-5400
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Environment: Hadoop 0.20 + 0.20.0 + HADOOP-5225 + HADOOP-5332
> Reporter: Rajiv Chittajallu
> Priority: Blocker
>
> mapred.jobtracker.restart.recover is set to true in mapred-site.xml
> This is a job that failed during Job submit due to invalid ACL
> 2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: error: org.apache.hadoop.security.AccessControlException: User rajive cannot perform operation SUBMIT_JOB on queue default
> When the JobTracker was restarted after some time, the failed job was being recovered/restarted
> 2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_200903041852_0040. Deleting it!!
> 2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
> 2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to recover job job_200903041223_0259
> 2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker failed to recover job job_200903041223_0259. Ignoring it.
> java.io.FileNotFoundException: File file:/var/log/hadoop//history/jobtracker1.foo.com_1236192735577_job_200903041223_0259_rajive_word+count does not exist.
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
> at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
> at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
> at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
> at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
> at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
> 2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count for job job_200903041223_0259 is 0
> 2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200903041223_0259 = 4664646202464
> 2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_200903041223_0259 with 34640 splits:
> These jobs failed during job submit shouldn't be considered for recovery.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-5400) JT restart recovery: Exclude jobs
which failed during SUBMIT_JOB (due to acl)
Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hemanth Yamijala resolved HADOOP-5400.
--------------------------------------
Resolution: Duplicate
This is a duplicate of HADOOP-5327.
> JT restart recovery: Exclude jobs which failed during SUBMIT_JOB (due to acl)
> -------------------------------------------------------------------------------
>
> Key: HADOOP-5400
> URL: https://issues.apache.org/jira/browse/HADOOP-5400
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Environment: Hadoop 0.20 + 0.20.0 + HADOOP-5225 + HADOOP-5332
> Reporter: Rajiv Chittajallu
> Priority: Blocker
>
> mapred.jobtracker.restart.recover is set to true in mapred-site.xml
> This is a job that failed during Job submit due to invalid ACL
> 2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: error: org.apache.hadoop.security.AccessControlException: User rajive cannot perform operation SUBMIT_JOB on queue default
> When the JobTracker was restarted after some time, the failed job was being recovered/restarted
> 2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_200903041852_0040. Deleting it!!
> 2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
> 2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to recover job job_200903041223_0259
> 2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker failed to recover job job_200903041223_0259. Ignoring it.
> java.io.FileNotFoundException: File file:/var/log/hadoop//history/jobtracker1.foo.com_1236192735577_job_200903041223_0259_rajive_word+count does not exist.
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
> at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
> at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
> at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
> at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
> at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
> 2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count for job job_200903041223_0259 is 0
> 2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200903041223_0259 = 4664646202464
> 2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_200903041223_0259 with 34640 splits:
> These jobs failed during job submit shouldn't be considered for recovery.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.