You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/03/03 00:47:05 UTC

[jira] [Commented] (PIG-4412) Race condition in writing multiple outputs from STREAM op

    [ https://issues.apache.org/jira/browse/PIG-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344031#comment-14344031 ] 

Rohini Palaniswamy commented on PIG-4412:
-----------------------------------------

Had TestStreamingLocal.testSimpleMapSideStreaming fail with below error in one of the runs. Succeeded next time. Believe it is introduced by this patch.

{code}
WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local1751597657_0011
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 2083: Error while trying to get next result in POStream.
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2083: Error while trying to get next result in POStream.
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextTuple(POStream.java:251)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:279)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.cleanup(PigGenericMapBase.java:123)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.NullPointerException
	at org.apache.pig.impl.streaming.ExecutableManager.close(ExecutableManager.java:132)
	at org.apache.pig.backend.hadoop.streaming.HadoopExecutableManager.close(HadoopExecutableManager.java:131)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.finish(POStream.java:363)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextTuple(POStream.java:172)
{code}

> Race condition in writing multiple outputs from STREAM op
> ---------------------------------------------------------
>
>                 Key: PIG-4412
>                 URL: https://issues.apache.org/jira/browse/PIG-4412
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.15.0
>
>         Attachments: PIG-4412.patch
>
>
> Basically copying the issue described here:
> http://stackoverflow.com/questions/28327044/pig-streaming-some-output-files-are-missing
> Roughly, I believe the issue is that there is a race condition in the code in the HadoopExecutableManager that moves multiple output files from a script into HDFS and the MapReduce task that is shutting down after it writes the last bits from the "main" output of the STREAM task. Pig needs to make sure that the ExecutableManager is closed (and thus the files are moved from the local dir to HDFS) before it returns the end-of-stream tuple to signal that the stream is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)