You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 13:52:17 UTC

[GitHub] [beam] damccorm opened a new issue, #19899: GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.

damccorm opened a new issue, #19899:
URL: https://github.com/apache/beam/issues/19899

   Obvserved while executing a wordcount IT pipeline:
   ```
   
    ./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
   -Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it
   \
   -Dattr=IT -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \
   --staging_location=gs://some_bucket/
   \
   --temp_location=gs://some_bucket/ \
   --input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*
   \
   --output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output  \
   --expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710
   \
   --num_workers=10 \
   --autoscaling_algorithm=NONE \
   --runner=TestDataflowRunner \
   --sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz"
   \
   --info  
   
   ```
   
   gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a different project than was running the pipeline.
   
   This caused a bunch of Broken pipe errors. Console logs:
   ```
   
   root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation read/Read+split+pair_with_one+group/Reify+group/Write
   root:
   INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation group/Close
   root: INFO: 2019-09-11T19:06:23.208Z:
   JOB_MESSAGE_BASIC: Finished operation group/Close
   root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC:
   Executing operation group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
   root:
   INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
     File "apache_beam/runners/common.py",
   line 782, in apache_beam.runners.common.DoFnRunner.process
     File "apache_beam/runners/common.py",
   line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process
     File "apache_beam/runners/common.py",
   line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
     File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py",
   line 1042, in process
       self.writer.write(element)
     File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py",
   line 393, in write
       self.sink.write_record(self.temp_handle, value)
     File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py",
   line 137, in write_record
       self.write_encoded_record(file_handle, self.coder.encode(value))
     File
   "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", line 407, in write_encoded_record
   
      file_handle.write(encoded_value)
     File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py",
   line 202, in write
       self._uploader.put(b)
     File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py",
   line 594, in put
       self._conn.send_bytes(data.tobytes())
     File "/usr/local/lib/python3.6/multiprocessing/connection.py",
   line 200, in send_bytes
       self._send_bytes(m[offset:offset + size])
     File "/usr/local/lib/python3.6/multiprocessing/connection.py",
   line 397, in _send_bytes
       self._send(header)
     File "/usr/local/lib/python3.6/multiprocessing/connection.py",
   line 368, in _send
       n = write(self._handle, buf)
   BrokenPipeError: [Errno 32] Broken pipe
   
   During
   handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
   
    File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
   ...
   
   root:
   INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure step failure25
   root: INFO: 2019-09-11T19:06:33.066Z:
   JOB_MESSAGE_ERROR: Workflow failed. Causes: S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
   failed., The job failed because a work item has failed 4 times. Look in previous log entries for the
   cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors.
   The work item was attempted on these workers:
     beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h
   
        Root cause: Work item failed.,
     beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc
         Root
   cause: Work item failed.,
     beamapp-valentyn-09111855-09111155-pj3z-harness-45pp
         Root cause:
   Work item failed.,
     beamapp-valentyn-09111855-09111155-pj3z-ha
   
   ```
   
   Errors were gone after I changed the bucket to a bucket in the project where I ran the pipeline.
   
   Imported from Jira [BEAM-8216](https://issues.apache.org/jira/browse/BEAM-8216). Original Jira may contain additional context.
   Reported by: tvalentyn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org