You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 20:54:42 UTC

[GitHub] [beam] damccorm opened a new issue, #21009: 404 Session not found, when querying Google Cloud Spanner with Python Dataflow.

damccorm opened a new issue, #21009:
URL: https://github.com/apache/beam/issues/21009

   My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial run is successful, but any subsequent run fails with this error. "h1.google.api_core.exceptions.NotFound: 404 Session not found"
   and also "504 Deadline Exceeded"
   
   Here is part of the code:
   
   ```
   
   
   SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'
   
   spanner_domains = (
         p
   
        | 'ReadFromSpanner' >> ReadFromSpanner(
             project_id, database, database, sql=SPANNER_QUERY)
   
        | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))
   
   def _KeyDomainSpanner(entity):
     row
   = {}
     for i, column in enumerate(['row_id', 'update_key']):
       row[column] = entity[i]
     return
   row['row_id'], row
   
   
   ```
   
   
   The Dataflow job is able to read around 10M rows with 2.29.0 but only a few thousand with 2.33.0
   
   Imported from Jira [BEAM-12773](https://issues.apache.org/jira/browse/BEAM-12773). Original Jira may contain additional context.
   Reported by: regeter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org