You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 13:52:17 UTC
[GitHub] [beam] damccorm opened a new issue, #19899: GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.
damccorm opened a new issue, #19899:
URL: https://github.com/apache/beam/issues/19899
Obvserved while executing a wordcount IT pipeline:
```
./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it
\
-Dattr=IT -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \
--staging_location=gs://some_bucket/
\
--temp_location=gs://some_bucket/ \
--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*
\
--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output \
--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710
\
--num_workers=10 \
--autoscaling_algorithm=NONE \
--runner=TestDataflowRunner \
--sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz"
\
--info
```
gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a different project than was running the pipeline.
This caused a bunch of Broken pipe errors. Console logs:
```
root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation read/Read+split+pair_with_one+group/Reify+group/Write
root:
INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation group/Close
root: INFO: 2019-09-11T19:06:23.208Z:
JOB_MESSAGE_BASIC: Finished operation group/Close
root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC:
Executing operation group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
root:
INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
File "apache_beam/runners/common.py",
line 782, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py",
line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py",
line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py",
line 1042, in process
self.writer.write(element)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py",
line 393, in write
self.sink.write_record(self.temp_handle, value)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py",
line 137, in write_record
self.write_encoded_record(file_handle, self.coder.encode(value))
File
"/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", line 407, in write_encoded_record
file_handle.write(encoded_value)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py",
line 202, in write
self._uploader.put(b)
File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py",
line 594, in put
self._conn.send_bytes(data.tobytes())
File "/usr/local/lib/python3.6/multiprocessing/connection.py",
line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/usr/local/lib/python3.6/multiprocessing/connection.py",
line 397, in _send_bytes
self._send(header)
File "/usr/local/lib/python3.6/multiprocessing/connection.py",
line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
During
handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
...
root:
INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure step failure25
root: INFO: 2019-09-11T19:06:33.066Z:
JOB_MESSAGE_ERROR: Workflow failed. Causes: S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
failed., The job failed because a work item has failed 4 times. Look in previous log entries for the
cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors.
The work item was attempted on these workers:
beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h
Root cause: Work item failed.,
beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc
Root
cause: Work item failed.,
beamapp-valentyn-09111855-09111155-pj3z-harness-45pp
Root cause:
Work item failed.,
beamapp-valentyn-09111855-09111155-pj3z-ha
```
Errors were gone after I changed the bucket to a bucket in the project where I ran the pipeline.
Imported from Jira [BEAM-8216](https://issues.apache.org/jira/browse/BEAM-8216). Original Jira may contain additional context.
Reported by: tvalentyn.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org