You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Maximilian Michels (JIRA)" <ji...@apache.org> on 2018/11/01 13:14:00 UTC

[jira] [Reopened] (BEAM-5026) Portable flink wordcount fails sometimes due to non-existent source path in FileBasedSink._check_state_for_finalize_write

     [ https://issues.apache.org/jira/browse/BEAM-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maximilian Michels reopened BEAM-5026:
--------------------------------------
      Assignee: Ahmet Altay  (was: Ankur Goenka)

I see this consistently failing with the latest master, but only in batch mode. Reopening.

> Portable flink wordcount fails sometimes due to non-existent source path in FileBasedSink._check_state_for_finalize_write
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-5026
>                 URL: https://issues.apache.org/jira/browse/BEAM-5026
>             Project: Beam
>          Issue Type: Bug
>          Components: examples-python, runner-flink, sdk-py-core
>    Affects Versions: 2.6.0
>            Reporter: Ryan Williams
>            Assignee: Ahmet Altay
>            Priority: Minor
>             Fix For: 2.7.0
>
>
> Running portable flink wordcount locally:
> In one terminal:
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow{code}
> In another:
> {code:java}
> python -m apache_beam.examples.wordcount --harness_docker_image <image> --input /etc/profile --output /tmp/py-wordcount-direct --experiments=beam_fn_api --runner=PortableRunner --job_endpoint=localhost:8099 --sdk_location=container{code}
> Typically, the first time I run this for a given job-server instance, I see a failure like this ([full output|https://gist.github.com/ryan-williams/a96bf259898b6260cd4f00b8a232057c#file-gistfile1-txt-L3460]):
> {code:java}
> File "apache_beam/runners/common.py", line 661, in apache_beam.runners.common._OutputProcessor.process_outputs
> def process_outputs(self, windowed_input_element, results):
> File "apache_beam/runners/common.py", line 676, in apache_beam.runners.common._OutputProcessor.process_outputs
> for result in results:
> File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", line 1074, in <genexpr>
> return (window.TimestampedValue(v, window.MAX_TIMESTAMP) for v in outputs)
> File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", line 271, in finalize_write
> self._check_state_for_finalize_write(writer_results, num_shards))
> File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", line 249, in _check_state_for_finalize_write
> src, dst))
> BeamIOError: src and dst files do not exist. src: /tmp/beam-temp-py-wordcount-direct-6a0d8862908c11e88de8025000000001/5cfa9f22-9246-41fb-adef-ca04d5a5fe50.py-wordcount-direct, dst: /tmp/py-wordcount-direct-00000-of-00001 with exceptions None [while running 'write/Write/WriteImpl/FinalizeWrite'] with exceptions None
> {code}
> This is after a fix to [a slightly earlier failure in {{FileBasedSink}} documented on BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742?focusedCommentId=16545622&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16545622] which I've been working on in [#5903|https://github.com/apache/beam/pull/5903].
> It typically occurs only on the first run of wordcount against a given job-server instance.
> I'm curious whether others see this, whether it's some race condition in the FileBasedSink, LocalFileSystem, my macbook's disk, or somewhere else, or whether some temporary directory is getting created on the first run (for each job-server) that explains why subsequent wordcount runs succeed, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)