You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Pablo Estrada (Jira)" <ji...@apache.org> on 2021/09/13 18:40:00 UTC
[jira] [Commented] (BEAM-12773) 404 Session not found, when
querying Google Cloud Spanner with Python Dataflow.
[ https://issues.apache.org/jira/browse/BEAM-12773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414500#comment-17414500 ]
Pablo Estrada commented on BEAM-12773:
--------------------------------------
Does this run as a template?
> 404 Session not found, when querying Google Cloud Spanner with Python Dataflow.
> -------------------------------------------------------------------------------
>
> Key: BEAM-12773
> URL: https://issues.apache.org/jira/browse/BEAM-12773
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Affects Versions: 2.29.0, 2.33.0
> Reporter: Reto Egeter
> Priority: P2
> Attachments: dataflow_inprogress_2.29.0.png, dataflow_spanner_error_2.29.0.png, dataflow_spanner_error_2.33.0.png
>
>
> My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial run is successful, but any subsequent run fails with this error. "h1.google.api_core.exceptions.NotFound: 404 Session not found"
> and also "504 Deadline Exceeded"
> Here is part of the code:
> {code:python}
> SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'
> spanner_domains = (
> p
> | 'ReadFromSpanner' >> ReadFromSpanner(
> project_id, database, database, sql=SPANNER_QUERY)
> | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))
> def _KeyDomainSpanner(entity):
> row = {}
> for i, column in enumerate(['row_id', 'update_key']):
> row[column] = entity[i]
> return row['row_id'], row
> {code}
> The Dataflow job is able to read around 10M rows with 2.29.0 but only a few thousand with 2.33.0
--
This message was sent by Atlassian Jira
(v8.3.4#803005)