You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by David Olsen <da...@gmail.com> on 2016/05/25 15:58:58 UTC

InProcessPipelineRunner seems to hang with log printed 'Finalizing write operation'

I try word count with InProcessPipelineRunner and it basically works. But I
am not sure if I use the correct way to stop the pipeline running. My code
is at http://paste.debian.net/702925

The execution prints the following messages and then seems to hang forever
with thread not being terminated.

2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1e9d9fe
2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ff69ad

Following code snippet is used to terminate the execution, but I don't know
whether it's correct usage or not.

final InProcessPipelineResult result = (InProcessPipelineResult) p.run();
result.awaitCompletion();

Also calling 'p.run();' only without obtaining result then
awaitCompletion() has the same issue.

beam version is git commit 78c8c528e23dba4f41e6ffc98b4c0d78bcea5c08
scala 2.11.x
sbt 0.13.x

I appreciate any suggestions. Thanks.

Re: InProcessPipelineRunner seems to hang with log printed 'Finalizing write operation'

Posted by Thomas Groh <tg...@google.com>.
I've run the example code (using two Shakespeare plays as input) and
everything worked as expected (the pipeline produced wordcounts and then
shut down). Can you attach a debugger to
ExecutorServiceParallelExecutor:431 and see where the runner appears to
hang?

The log statement printed at that line is followed by a call to
ExecutorService#shutdown(), which should cause the worker threads to be
terminated (relatively quickly, as no additional work will be accepted
after that call, and all previous work has completed), which should allow
the JVM to shut down once it's run through #awaitCompletion() (this is
automatically invoked based on the default isBlockOnRun value); an explicit
call to `awaitCompletion()` should return immediately if this is the case.
There is admittedly a bit of additional defensive work that could be added
to ExecutorServiceParallelExecutor, which I've published in github PR 399

The fact that you're getting output files (and the write operations are
finalized) suggests that everything is being at least invoked as expected.
What are you using as input files?

https://github.com/apache/incubator-beam/pull/399

On Thu, May 26, 2016 at 8:33 AM, David Olsen <da...@gmail.com>
wrote:

> Yes, the behavior is the same even without result.awaitCompletion().
>
> After turning debug on, looks like it's cleaning the temp files
> (result-0000?-of-00004-*) produced. And within the output folder only 4
> files are left.
>
> result-00000-of-00004  result-00002-of-00004
> result-00001-of-00004  result-00003-of-00004
>
> But only 4 lines are written to these files (each file only contains 1
> line)
>
> $cat result-0000* | wc -l
> 4
>
> Below is the end of messages after the debug level is turned on:
>
> /tmp/out/result-00001-of-00004-temp-e5c20f19-f6dc-49f7-afe5-dfd168dd6353
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-d2b6bd5f-ca22-402c-8fbb-9445059cb626
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-1b256ecc-e0f9-43fa-ad37-c0920db1aaf7
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-27a03df8-50c8-44e2-ac7b-c4c9b424ddb1
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-70857308-2faf-4e38-98d1-4dc0c0b14b2a
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-ae226ef3-a7b3-4aa6-9caf-dff60a2e3f1e
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-4b19e860-2505-4aa9-b779-a2ae12b5c781
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-c9ac3c39-2812-4fc7-92f6-804717ab9dbd
> 2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
> file
> /tmp/out/result-00001-of-00004-temp-68041765-c800-4116-82eb-2f3874b6d0c7
> 2016-05-26 23:20:28 DEBUG Write:234 - Done finalizing write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ac61f9
> 2016-05-26 23:20:28 DEBUG ExecutorServiceParallelExecutor:431 - Pipeline
> is finished. Shutting down. {}
>
> Although the console prints 'Shutting down', it still hangs indefinitely
> without back to the command prompt.
>
> Thanks for help.
>
>
> On 26 May 2016 at 04:53, Thomas Groh <tg...@google.com> wrote:
>
>> Additionally, if you set the logging to Debug (with the same beam
>> version) there should be two additional log lines printed for each Write
>> operation, which would help significantly.
>>
>> On Wed, May 25, 2016 at 10:52 AM, Thomas Groh <tg...@google.com> wrote:
>>
>>> If you inspect the output result (at /tmp/out/result-...-of-00004),
>>> once the pipeline appears to be hung, has the pipeline produced output? The
>>> logs that claim that the write has been finalized suggest that the pipeline
>>> is complete, and the failure is occurring between the underlying executor
>>> and the InProcessPipelineResult, rather than the Pipeline execution. My
>>> initial expectation is that this may be due to an issue between the
>>> InProcessPipelineResult and the underlying executor rather than the
>>> executor proper.
>>>
>>> When the call to `result.awaitCompletion()` is removed, you see the same
>>> behavior, correct?
>>>
>>> On Wed, May 25, 2016 at 8:58 AM, David Olsen <da...@gmail.com>
>>> wrote:
>>>
>>>> I try word count with InProcessPipelineRunner and it basically works.
>>>> But I am not sure if I use the correct way to stop the pipeline running. My
>>>> code is at http://paste.debian.net/702925
>>>>
>>>> The execution prints the following messages and then seems to hang
>>>> forever with thread not being terminated.
>>>>
>>>> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write
>>>> operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
>>>> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write
>>>> operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
>>>> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
>>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
>>>> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
>>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1e9d9fe
>>>> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
>>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
>>>> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
>>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ff69ad
>>>>
>>>> Following code snippet is used to terminate the execution, but I don't
>>>> know whether it's correct usage or not.
>>>>
>>>> final InProcessPipelineResult result = (InProcessPipelineResult)
>>>> p.run();
>>>> result.awaitCompletion();
>>>>
>>>> Also calling 'p.run();' only without obtaining result then
>>>> awaitCompletion() has the same issue.
>>>>
>>>> beam version is git commit 78c8c528e23dba4f41e6ffc98b4c0d78bcea5c08
>>>> scala 2.11.x
>>>> sbt 0.13.x
>>>>
>>>> I appreciate any suggestions. Thanks.
>>>>
>>>
>>>
>>
>

Re: InProcessPipelineRunner seems to hang with log printed 'Finalizing write operation'

Posted by David Olsen <da...@gmail.com>.
Yes, the behavior is the same even without result.awaitCompletion().

After turning debug on, looks like it's cleaning the temp files
(result-0000?-of-00004-*) produced. And within the output folder only 4
files are left.

result-00000-of-00004  result-00002-of-00004
result-00001-of-00004  result-00003-of-00004

But only 4 lines are written to these files (each file only contains 1 line)

$cat result-0000* | wc -l
4

Below is the end of messages after the debug level is turned on:

/tmp/out/result-00001-of-00004-temp-e5c20f19-f6dc-49f7-afe5-dfd168dd6353
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-d2b6bd5f-ca22-402c-8fbb-9445059cb626
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-1b256ecc-e0f9-43fa-ad37-c0920db1aaf7
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-27a03df8-50c8-44e2-ac7b-c4c9b424ddb1
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-70857308-2faf-4e38-98d1-4dc0c0b14b2a
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-ae226ef3-a7b3-4aa6-9caf-dff60a2e3f1e
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-4b19e860-2505-4aa9-b779-a2ae12b5c781
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-c9ac3c39-2812-4fc7-92f6-804717ab9dbd
2016-05-26 23:20:28 DEBUG FileBasedSink$LocalFileOperations:654 - Removing
file
/tmp/out/result-00001-of-00004-temp-68041765-c800-4116-82eb-2f3874b6d0c7
2016-05-26 23:20:28 DEBUG Write:234 - Done finalizing write operation
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ac61f9
2016-05-26 23:20:28 DEBUG ExecutorServiceParallelExecutor:431 - Pipeline is
finished. Shutting down. {}

Although the console prints 'Shutting down', it still hangs indefinitely
without back to the command prompt.

Thanks for help.


On 26 May 2016 at 04:53, Thomas Groh <tg...@google.com> wrote:

> Additionally, if you set the logging to Debug (with the same beam version)
> there should be two additional log lines printed for each Write operation,
> which would help significantly.
>
> On Wed, May 25, 2016 at 10:52 AM, Thomas Groh <tg...@google.com> wrote:
>
>> If you inspect the output result (at /tmp/out/result-...-of-00004), once
>> the pipeline appears to be hung, has the pipeline produced output? The logs
>> that claim that the write has been finalized suggest that the pipeline is
>> complete, and the failure is occurring between the underlying executor and
>> the InProcessPipelineResult, rather than the Pipeline execution. My initial
>> expectation is that this may be due to an issue between the
>> InProcessPipelineResult and the underlying executor rather than the
>> executor proper.
>>
>> When the call to `result.awaitCompletion()` is removed, you see the same
>> behavior, correct?
>>
>> On Wed, May 25, 2016 at 8:58 AM, David Olsen <da...@gmail.com>
>> wrote:
>>
>>> I try word count with InProcessPipelineRunner and it basically works.
>>> But I am not sure if I use the correct way to stop the pipeline running. My
>>> code is at http://paste.debian.net/702925
>>>
>>> The execution prints the following messages and then seems to hang
>>> forever with thread not being terminated.
>>>
>>> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
>>> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
>>> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
>>> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1e9d9fe
>>> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
>>> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
>>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ff69ad
>>>
>>> Following code snippet is used to terminate the execution, but I don't
>>> know whether it's correct usage or not.
>>>
>>> final InProcessPipelineResult result = (InProcessPipelineResult)
>>> p.run();
>>> result.awaitCompletion();
>>>
>>> Also calling 'p.run();' only without obtaining result then
>>> awaitCompletion() has the same issue.
>>>
>>> beam version is git commit 78c8c528e23dba4f41e6ffc98b4c0d78bcea5c08
>>> scala 2.11.x
>>> sbt 0.13.x
>>>
>>> I appreciate any suggestions. Thanks.
>>>
>>
>>
>

Re: InProcessPipelineRunner seems to hang with log printed 'Finalizing write operation'

Posted by Thomas Groh <tg...@google.com>.
Additionally, if you set the logging to Debug (with the same beam version)
there should be two additional log lines printed for each Write operation,
which would help significantly.

On Wed, May 25, 2016 at 10:52 AM, Thomas Groh <tg...@google.com> wrote:

> If you inspect the output result (at /tmp/out/result-...-of-00004), once
> the pipeline appears to be hung, has the pipeline produced output? The logs
> that claim that the write has been finalized suggest that the pipeline is
> complete, and the failure is occurring between the underlying executor and
> the InProcessPipelineResult, rather than the Pipeline execution. My initial
> expectation is that this may be due to an issue between the
> InProcessPipelineResult and the underlying executor rather than the
> executor proper.
>
> When the call to `result.awaitCompletion()` is removed, you see the same
> behavior, correct?
>
> On Wed, May 25, 2016 at 8:58 AM, David Olsen <da...@gmail.com>
> wrote:
>
>> I try word count with InProcessPipelineRunner and it basically works. But
>> I am not sure if I use the correct way to stop the pipeline running. My
>> code is at http://paste.debian.net/702925
>>
>> The execution prints the following messages and then seems to hang
>> forever with thread not being terminated.
>>
>> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
>> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
>> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
>> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1e9d9fe
>> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
>> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
>> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ff69ad
>>
>> Following code snippet is used to terminate the execution, but I don't
>> know whether it's correct usage or not.
>>
>> final InProcessPipelineResult result = (InProcessPipelineResult) p.run();
>> result.awaitCompletion();
>>
>> Also calling 'p.run();' only without obtaining result then
>> awaitCompletion() has the same issue.
>>
>> beam version is git commit 78c8c528e23dba4f41e6ffc98b4c0d78bcea5c08
>> scala 2.11.x
>> sbt 0.13.x
>>
>> I appreciate any suggestions. Thanks.
>>
>
>

Re: InProcessPipelineRunner seems to hang with log printed 'Finalizing write operation'

Posted by Thomas Groh <tg...@google.com>.
If you inspect the output result (at /tmp/out/result-...-of-00004), once
the pipeline appears to be hung, has the pipeline produced output? The logs
that claim that the write has been finalized suggest that the pipeline is
complete, and the failure is occurring between the underlying executor and
the InProcessPipelineResult, rather than the Pipeline execution. My initial
expectation is that this may be due to an issue between the
InProcessPipelineResult and the underlying executor rather than the
executor proper.

When the call to `result.awaitCompletion()` is removed, you see the same
behavior, correct?

On Wed, May 25, 2016 at 8:58 AM, David Olsen <da...@gmail.com>
wrote:

> I try word count with InProcessPipelineRunner and it basically works. But
> I am not sure if I use the correct way to stop the pipeline running. My
> code is at http://paste.debian.net/702925
>
> The execution prints the following messages and then seems to hang forever
> with thread not being terminated.
>
> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
> 2016-05-25 22:37:12 INFO  Write:183 - Opening writer for write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@40b01b
> 2016-05-25 22:37:16 INFO  Write:230 - Finalizing write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1e9d9fe
> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@1b1a53a
> 2016-05-25 22:37:17 INFO  Write:230 - Finalizing write operation
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@ff69ad
>
> Following code snippet is used to terminate the execution, but I don't
> know whether it's correct usage or not.
>
> final InProcessPipelineResult result = (InProcessPipelineResult) p.run();
> result.awaitCompletion();
>
> Also calling 'p.run();' only without obtaining result then
> awaitCompletion() has the same issue.
>
> beam version is git commit 78c8c528e23dba4f41e6ffc98b4c0d78bcea5c08
> scala 2.11.x
> sbt 0.13.x
>
> I appreciate any suggestions. Thanks.
>