You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Clemens Valiente (JIRA)" <ji...@apache.org> on 2018/09/17 08:25:00 UTC

[jira] [Commented] (OOZIE-2536) Hadoop's cleanup of local directory in uber mode causing failures

    [ https://issues.apache.org/jira/browse/OOZIE-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617206#comment-16617206 ] 

Clemens Valiente commented on OOZIE-2536:
-----------------------------------------

[~satishsaley] thanks for this patch, we were running into the same issue and backporting it to our oozie sharelib fixed the issue.

we're running into another issue that looks very similar though:
{code:java}
2018-09-15 00:31:09,416 [AsyncDispatcher event handler] INFO  org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl  - Num completed Tasks: 1
2018-09-15 00:31:09,417 [AsyncDispatcher event handler] INFO  org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl  - job_1535972033593_259806Job Transitioned from RUNNING to COMMITTING
2018-09-15 00:31:09,419 [CommitterEvent Processor #1] INFO  org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler  - Processing the event EventType: JOB_COMMIT
2018-09-15 00:31:09,455 [uber-SubtaskRunner] WARN  org.apache.hadoop.mapred.LocalContainerLauncher  - Unable to delete unexpected local file/dir .action.xml.crc: insufficient permissions?
2018-09-15 00:31:09,455 [uber-SubtaskRunner] WARN  org.apache.hadoop.mapred.LocalContainerLauncher  - Unable to delete unexpected local file/dir .action.xml.crc: insufficient permissions?
2018-09-15 00:31:09,459 [CommitterEvent Processor #1] FATAL org.apache.hadoop.conf.Configuration  - error parsing conf sqoop-site.xml
java.io.FileNotFoundException: /appdata/hdfs/v7/yarn/nm/usercache/SEM/appcache/application_1535972033593_259806/container_e100_1535972033593_259806_01_000001/sqoop-site.xml (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:146)
	at java.io.FileInputStream.<init>(FileInputStream.java:101)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2483)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2554)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2507)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2413)
	at org.apache.hadoop.conf.Configuration.get(Configuration.java:984)
	at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1034)
	at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1254)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.touchz(CommitterEventHandler.java:268)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:282)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745){code}
looks like the same race condition can possibly happen with the sqoop-site.xml. I will investigate if the same approach as your patch would also fix this.

> Hadoop's cleanup of local directory in uber mode causing failures
> -----------------------------------------------------------------
>
>                 Key: OOZIE-2536
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2536
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2536-1.patch
>
>
> In out environment, we faced an issue where uberized Shell action was getting stuck even though the shell action got completed with status 0. Please refer the attached syslog and stdout if launcher job, here I point out partially
> stdout :
> {quote}
> >>> Invoking Shell command line now >>
> Stdoutput myshellType=qmyshellUpdate
> Exit code of the Shell command 0
> <<< Invocation of Shell command completed <<<
> <<< Invocation of Main class completed <<<
> {quote} 
> syslog
> {quote}
> 2016-05-23 11:15:52,587 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Unable to delete unexpected local file/dir .action.xml.crc: insufficient permissions?
> 2016-05-23 11:15:52,588 FATAL [AsyncDispatcher event handler] org.apache.hadoop.conf.Configuration: error parsing conf propagation-conf.xml
> java.io.FileNotFoundException: /tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at java.io.FileInputStream.<init>(FileInputStream.java:93)
>     at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>     at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>     at java.net.URL.openStream(URL.java:1038)
>     at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>     at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>     at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>     at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
>     at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
>     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.getMemoryRequired(TaskAttemptImpl.java:568)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateMillisCounters(TaskAttemptImpl.java:1295)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createJobCounterUpdateEventTASucceeded(TaskAttemptImpl.java:1323)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$3500(TaskAttemptImpl.java:147)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1710)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1701)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1085)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1394)
>     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1386)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-23 11:15:52,590 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.RuntimeException: java.io.FileNotFoundException: /grid/5/tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml (No such file or directory)
>     at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2639)
>     at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>     at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
>     at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
>     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.getMemoryRequired(TaskAttemptImpl.java:568)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateMillisCounters(TaskAttemptImpl.java:1295)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createJobCounterUpdateEventTASucceeded(TaskAttemptImpl.java:1323)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$3500(TaskAttemptImpl.java:147)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1710)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1701)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1085)
>     at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1394)
>     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1386)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: /tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at java.io.FileInputStream.<init>(FileInputStream.java:93)
>     at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>     at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>     at java.net.URL.openStream(URL.java:1038)
>     at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>     at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>     ... 22 more
> 2016-05-23 11:15:52,591 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> 2016-05-23 11:15:52,591 ERROR [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[AsyncDispatcher ShutDown handler,5,main] threw an Exception.
> java.lang.SecurityException: Intercepted System.exit(-1)
>     at org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:637)
>     at java.lang.Runtime.exit(Runtime.java:107)
>     at java.lang.System.exit(System.java:971)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$2.run(AsyncDispatcher.java:294)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-23 11:16:44,589 WARN [LeaseRenewer:saley@namenode.com:8020] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2016-05-23 11:20:53,677 INFO [Socket Reader #2 for port 50500] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for saley (auth:SIMPLE)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)