You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/03/17 17:50:00 UTC

[jira] [Commented] (BEAM-14083) ReadFromBigquery examples throw pickling exception when using InteractiveRunner

    [ https://issues.apache.org/jira/browse/BEAM-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508345#comment-17508345 ] 

Kenneth Knowles commented on BEAM-14083:
----------------------------------------

[~ningk] could you take a look?

> ReadFromBigquery examples throw pickling exception when using InteractiveRunner
> -------------------------------------------------------------------------------
>
>                 Key: BEAM-14083
>                 URL: https://issues.apache.org/jira/browse/BEAM-14083
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp, runner-py-interactive
>    Affects Versions: 2.35.0, 2.36.0, 2.37.0
>         Environment: Cloud Dataflow Workbench Notebook on GCP
> Apache Beam 2.37.0 Kernel for Python 3
>            Reporter: Mark Grey
>            Priority: P2
>
> When using a combination of the python InteractiveRunner and beam.io.ReadFromBigquery, the canonical examples from the beam python tutorials for BigQuery trigger and exception that appears to result from failing to serialize generators:
> {code:java|title=notebook.py|borderStyle=solid}
> pipeline = beam.Pipeline(InteractiveRunner(), options=options)
> max_temperatures = (
>     pipeline
>     | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery(
>         query='SELECT max_temperature FROM '\
>               '`clouddataflow-readonly.samples.weather_stations`',
>         use_standard_sql=True, gcs_location=gcs_location)
>     # Each row is a dictionary where the keys are the BigQuery columns
>     | beam.Map(lambda elem: elem['max_temperature']))
> pipeline.run()
> {code}
> {noformat}
> ~/apache-beam-2.37.0/lib/python3.7/site-packages/apache_beam/coders/coders.py in <lambda>(x)
>     800     protocol = pickle.HIGHEST_PROTOCOL
>     801     return coder_impl.CallbackCoderImpl(
> --> 802         lambda x: dumps(x, protocol), pickle.loads)
>     803 
>     804   def as_deterministic_coder(self, step_label, error_message=None):
> TypeError: can't pickle generator objects [while running '[6]: QueryTableStdSQL/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
> {noformat}
> The interactive pipeline works as expected in version 2.34



--
This message was sent by Atlassian Jira
(v8.20.1#820001)