You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Zdenko Hrcek (JIRA)" <ji...@apache.org> on 2017/11/01 19:53:00 UTC

[jira] [Created] (BEAM-3134) cannot write data to BigQuery with Dataflow

Zdenko Hrcek created BEAM-3134:
----------------------------------

             Summary: cannot write data to BigQuery with Dataflow
                 Key: BEAM-3134
                 URL: https://issues.apache.org/jira/browse/BEAM-3134
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Zdenko Hrcek
            Assignee: Ahmet Altay
            Priority: Normal


(sample code with description is here [https://github.com/zdenulo/dataflow_bigquery_error])

I was running for the first time Dataflow job (with version 2.1.1) to read data from BigQuery, make some modifications, then write data to different table in BigQuery. When I was running locally (on small subset) it was ok, but when I tried to run on Dataflow I get following exception:

apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
(ade3180ffa878a6b): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 706, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 446, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session
    module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1182, in load_append
    list.append(value)
  File "/usr/local/lib/python2.7/dist-packages/apitools/base/protorpclite/messages.py", line 1142, in append
    self.__field.validate_element(value)
AttributeError: 'FieldList' object has no attribute '_FieldList__field'

In my opinion, it looks like it has to do something with pickling schema definition for output table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)