You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/03/17 17:50:00 UTC
[jira] [Commented] (BEAM-14083) ReadFromBigquery examples throw pickling exception when using InteractiveRunner
[ https://issues.apache.org/jira/browse/BEAM-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508345#comment-17508345 ]
Kenneth Knowles commented on BEAM-14083:
----------------------------------------
[~ningk] could you take a look?
> ReadFromBigquery examples throw pickling exception when using InteractiveRunner
> -------------------------------------------------------------------------------
>
> Key: BEAM-14083
> URL: https://issues.apache.org/jira/browse/BEAM-14083
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp, runner-py-interactive
> Affects Versions: 2.35.0, 2.36.0, 2.37.0
> Environment: Cloud Dataflow Workbench Notebook on GCP
> Apache Beam 2.37.0 Kernel for Python 3
> Reporter: Mark Grey
> Priority: P2
>
> When using a combination of the python InteractiveRunner and beam.io.ReadFromBigquery, the canonical examples from the beam python tutorials for BigQuery trigger and exception that appears to result from failing to serialize generators:
> {code:java|title=notebook.py|borderStyle=solid}
> pipeline = beam.Pipeline(InteractiveRunner(), options=options)
> max_temperatures = (
> pipeline
> | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery(
> query='SELECT max_temperature FROM '\
> '`clouddataflow-readonly.samples.weather_stations`',
> use_standard_sql=True, gcs_location=gcs_location)
> # Each row is a dictionary where the keys are the BigQuery columns
> | beam.Map(lambda elem: elem['max_temperature']))
> pipeline.run()
> {code}
> {noformat}
> ~/apache-beam-2.37.0/lib/python3.7/site-packages/apache_beam/coders/coders.py in <lambda>(x)
> 800 protocol = pickle.HIGHEST_PROTOCOL
> 801 return coder_impl.CallbackCoderImpl(
> --> 802 lambda x: dumps(x, protocol), pickle.loads)
> 803
> 804 def as_deterministic_coder(self, step_label, error_message=None):
> TypeError: can't pickle generator objects [while running '[6]: QueryTableStdSQL/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
> {noformat}
> The interactive pipeline works as expected in version 2.34
--
This message was sent by Atlassian Jira
(v8.20.1#820001)