You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2019/09/11 22:52:00 UTC

[jira] [Commented] (BEAM-8216) GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.

    [ https://issues.apache.org/jira/browse/BEAM-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928073#comment-16928073 ] 

Valentyn Tymofieiev commented on BEAM-8216:
-------------------------------------------

Over to [~chamikara] to triage.

> GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8216
>                 URL: https://issues.apache.org/jira/browse/BEAM-8216
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Valentyn Tymofieiev
>            Assignee: Chamikara Jayalath
>            Priority: Major
>
> Obvserved while executing a wordcount IT pipeline:
> {noformat}
>  ./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
> -Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \
> -Dattr=IT -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \
> --staging_location=gs://some_bucket/ \
> --temp_location=gs://some_bucket/ \
> --input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000* \
> --output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output  \
> --expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710 \
> --num_workers=10 \
> --autoscaling_algorithm=NONE \
> --runner=TestDataflowRunner \
> --sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz" \
> --info  
> {noformat}
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a different project than was running the pipeline.
> This caused a bunch of Broken pipe errors. Console logs:
> {noformat}
> root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation read/Read+split+pair_with_one+group/Reify+group/Write
> root: INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation group/Close
> root: INFO: 2019-09-11T19:06:23.208Z: JOB_MESSAGE_BASIC: Finished operation group/Close
> root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC: Executing operation group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
> root: INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py", line 1042, in process
>     self.writer.write(element)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line 393, in write
>     self.sink.write_record(self.temp_handle, value)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line 137, in write_record
>     self.write_encoded_record(file_handle, self.coder.encode(value))
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", line 407, in write_encoded_record
>     file_handle.write(encoded_value)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line 202, in write
>     self._uploader.put(b)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", line 594, in put
>     self._conn.send_bytes(data.tobytes())
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
>     self._send_bytes(m[offset:offset + size])
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
>     self._send(header)
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 368, in _send
>     n = write(self._handle, buf)
> BrokenPipeError: [Errno 32] Broken pipe
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
> ...
> root: INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure step failure25
> root: INFO: 2019-09-11T19:06:33.066Z: JOB_MESSAGE_ERROR: Workflow failed. Causes: S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers:
>   beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h
>       Root cause: Work item failed.,
>   beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc
>       Root cause: Work item failed.,
>   beamapp-valentyn-09111855-09111155-pj3z-harness-45pp
>       Root cause: Work item failed.,
>   beamapp-valentyn-09111855-09111155-pj3z-ha
> {noformat}
> Errors were gone after I changed the bucket to a bucket in the project where I ran the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)