You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2019/09/11 22:51:00 UTC
[jira] [Created] (BEAM-8216) GCS IO fails with uninformative
'Broken pipe' errors while attempting to write to a GCS bucket without
proper permissions.
Valentyn Tymofieiev created BEAM-8216:
-----------------------------------------
Summary: GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.
Key: BEAM-8216
URL: https://issues.apache.org/jira/browse/BEAM-8216
Project: Beam
Issue Type: Bug
Components: io-py-gcp
Reporter: Valentyn Tymofieiev
Assignee: Chamikara Jayalath
Obvserved while executing a wordcount IT pipeline:
{noformat}
./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \
-Dattr=IT -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \
--staging_location=gs://some_bucket/ \
--temp_location=gs://some_bucket/ \
--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000* \
--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output \
--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710 \
--num_workers=10 \
--autoscaling_algorithm=NONE \
--runner=TestDataflowRunner \
--sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz" \
--info
{noformat}
gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a different project than was running the pipeline.
This caused a bunch of Broken pipe errors. Console logs:
{noformat}
root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation read/Read+split+pair_with_one+group/Reify+group/Write
root: INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation group/Close
root: INFO: 2019-09-11T19:06:23.208Z: JOB_MESSAGE_BASIC: Finished operation group/Close
root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC: Executing operation group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
root: INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py", line 1042, in process
self.writer.write(element)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line 393, in write
self.sink.write_record(self.temp_handle, value)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line 137, in write_record
self.write_encoded_record(file_handle, self.coder.encode(value))
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", line 407, in write_encoded_record
file_handle.write(encoded_value)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line 202, in write
self._uploader.put(b)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", line 594, in put
self._conn.send_bytes(data.tobytes())
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
self._send(header)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
...
root: INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure step failure25
root: INFO: 2019-09-11T19:06:33.066Z: JOB_MESSAGE_ERROR: Workflow failed. Causes: S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers:
beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h
Root cause: Work item failed.,
beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc
Root cause: Work item failed.,
beamapp-valentyn-09111855-09111155-pj3z-harness-45pp
Root cause: Work item failed.,
beamapp-valentyn-09111855-09111155-pj3z-ha
{noformat}
Errors were gone after I changed the bucket to a bucket in the project where I ran the pipeline.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)