You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Mark Grey (Jira)" <ji...@apache.org> on 2022/03/10 15:04:00 UTC

[jira] [Created] (BEAM-14083) ReadFromBigquery examples throw pickling exception when using InteractiveRunner

Mark Grey created BEAM-14083:
--------------------------------

             Summary: ReadFromBigquery examples throw pickling exception when using InteractiveRunner
                 Key: BEAM-14083
                 URL: https://issues.apache.org/jira/browse/BEAM-14083
             Project: Beam
          Issue Type: Bug
          Components: io-py-gcp, runner-py-interactive
    Affects Versions: 2.37.0, 2.36.0, 2.35.0
         Environment: Cloud Dataflow Workbench Notebook on GCP
Apache Beam 2.37.0 Kernel for Python 3
            Reporter: Mark Grey


When using a combination of the python InteractiveRunner and beam.io.ReadFromBigquery, the canonical examples from the beam python tutorials for BigQuery trigger and exception that appears to result from failing to serialize generators:

{code:java|title=notebook.py|borderStyle=solid}
pipeline = beam.Pipeline(InteractiveRunner(), options=options)
max_temperatures = (
    pipeline
    | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery(
        query='SELECT max_temperature FROM '\
              '`clouddataflow-readonly.samples.weather_stations`',
        use_standard_sql=True, gcs_location=gcs_location)
    # Each row is a dictionary where the keys are the BigQuery columns
    | beam.Map(lambda elem: elem['max_temperature']))
pipeline.run()
{code}

{noformat}
~/apache-beam-2.37.0/lib/python3.7/site-packages/apache_beam/coders/coders.py in <lambda>(x)
    800     protocol = pickle.HIGHEST_PROTOCOL
    801     return coder_impl.CallbackCoderImpl(
--> 802         lambda x: dumps(x, protocol), pickle.loads)
    803 
    804   def as_deterministic_coder(self, step_label, error_message=None):

TypeError: can't pickle generator objects [while running '[6]: QueryTableStdSQL/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
{noformat}

The interactive pipeline works as expected in version 2.34



--
This message was sent by Atlassian Jira
(v8.20.1#820001)