You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2019/09/11 22:51:00 UTC

[jira] [Created] (BEAM-8216) GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.

Valentyn Tymofieiev created BEAM-8216:
-----------------------------------------

             Summary: GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.
                 Key: BEAM-8216
                 URL: https://issues.apache.org/jira/browse/BEAM-8216
             Project: Beam
          Issue Type: Bug
          Components: io-py-gcp
            Reporter: Valentyn Tymofieiev
            Assignee: Chamikara Jayalath


Obvserved while executing a wordcount IT pipeline:
{noformat}
 ./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
-Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \
-Dattr=IT -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \
--staging_location=gs://some_bucket/ \
--temp_location=gs://some_bucket/ \
--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000* \
--output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output  \
--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710 \
--num_workers=10 \
--autoscaling_algorithm=NONE \
--runner=TestDataflowRunner \
--sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz" \
--info  
{noformat}
gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a different project than was running the pipeline.

This caused a bunch of Broken pipe errors. Console logs:
{noformat}
root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation read/Read+split+pair_with_one+group/Reify+group/Write
root: INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation group/Close
root: INFO: 2019-09-11T19:06:23.208Z: JOB_MESSAGE_BASIC: Finished operation group/Close
root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC: Executing operation group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
root: INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py", line 1042, in process
    self.writer.write(element)
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line 393, in write
    self.sink.write_record(self.temp_handle, value)
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line 137, in write_record
    self.write_encoded_record(file_handle, self.coder.encode(value))
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", line 407, in write_encoded_record
    file_handle.write(encoded_value)
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line 202, in write
    self._uploader.put(b)
  File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", line 594, in put
    self._conn.send_bytes(data.tobytes())
  File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
    self._send(header)
  File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
...

root: INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure step failure25
root: INFO: 2019-09-11T19:06:33.066Z: JOB_MESSAGE_ERROR: Workflow failed. Causes: S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers:
  beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h
      Root cause: Work item failed.,
  beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc
      Root cause: Work item failed.,
  beamapp-valentyn-09111855-09111155-pj3z-harness-45pp
      Root cause: Work item failed.,
  beamapp-valentyn-09111855-09111155-pj3z-ha
{noformat}
Errors were gone after I changed the bucket to a bucket in the project where I ran the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)