You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 23:35:09 UTC

[GitHub] [beam] damccorm opened a new issue, #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

damccorm opened a new issue, #21476:
URL: https://github.com/apache/beam/issues/21476

   I am trying to write to bigquery to different table destinations and I would like to create the tables dynamically if they don't exist already.
   ```
   
   bigquery_rows | "Writing to Bigquery" >> WriteToBigQuery(lambda e: compute_table_name(e),
   schema=compute_table_schema,
                                               
   additional_bq_parameters=additional_bq_parameters,   
                                            
   write_disposition=BigQueryDisposition.WRITE_APPEND,
   create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
   )
   ```
   
   The function compute_table_name is quite simple actually, I am just trying to get it to work.
   ```
   
   def compute_table_name(element):
       if element['table'] == 'table_id':
           del element['table']
   
          return "project_id:dataset.table_id" 
   ```
   
   The schema is detected correctly and the table IS created and populated with records. The problem is, the table ID I get is something along the lines of:
   ```
   
   datasetId: 'dataset'
   projectId: 'project_id'
   tableId: 'beam_bq_job_LOAD_AUTOMATIC_JOB_NAME_LOAD_STEP...
   
   ```
   
   I have also tried returning a bigquery.TableReference object in my compute_table_name function to no avail.
   ```
   
   def compute_table_name(element):
       if element['table'] == 'Radio':
           del element['table']
    
         return TableReference(
               datasetId = "dataset_id",
               projectId = "project_id",
   
              tableId = "table_id"
           ) 
   ```
   
   
   Imported from Jira [BEAM-13487](https://issues.apache.org/jira/browse/BEAM-13487). Original Jira may contain additional context.
   Reported by: aremedi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] stefano-sequence commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
stefano-sequence commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1158266207

   Could you look at this across `method`? As per my previous comment, I encountered this issue only with `method='FILE_LOADS'`, while it doesn't happen with `method='STREAMING_INSERTS'`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] waltage commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
waltage commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1158243050

   I tried the following combinations of writes to BigQuery using a random PCollection, and all of these **worked as expected**.  The dimensions were all crossed, with the following levels:
   
   1. WriteToBigQuery `table` parameter, each of:
       * a `string`
       * a function that returns a `TableRef`
       * a function that returns a `string`
   2. WriteToBigQuery `create` and `write` `disposition` parameters, each of:
       * (default, default)
       * (`CREATE_NEVER`, `WRITE_APPEND`)
   3. WriteToBigQuery `schema` parameter, each of:
       * a `TableSchema`
       * a string `"SCHEMA_AUTODETECT"`
   
   <img width="1359" alt="image" src="https://user-images.githubusercontent.com/12177307/174190280-2bccdfd3-13ce-4a81-94d7-5eed653673e2.png">
   
   The code I used is [located here](https://github.com/waltage/beam-triage/tree/main/13487).  What other dimensions or levels should I look at?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1526084038

   I'm going to unassign this since it seems not active right now. Feel free to comment or grab it again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] waltage commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
waltage commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1159192343

   1. In all cases, there is no difference in whether the callable passed to the `table` param returns a `str` or a `TableReference`.
   2. Holistically, the issue looks to be more correlated with whether or not a `schema` is defined than it is with any of the other  parameters (including `method`).
   3. The temporary table names referenced in the original issue are in fact temporary, and they were all renamed correctly and/or deleted by the end of the pipelines regardless of overall success/failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] waltage commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
waltage commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1157130030

   I'll take a look at this over the next few days and see what I can come up with
   
   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] waltage commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
waltage commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1159157292

   I added in levels for `method` and for `schema` (table below, and you might need to scroll to the right more).  
   
   The last two columns are errors that are eventually returned by BigQuery, whereas SCHEMA AUTO ERR and WRITE DISPO ERR are exceptions raised by Beam itself.
   
   I'll make another comment with my thoughts.
   
   | ID| WTBQ(table)| WTBQ(method)| WTBQ(schema)| WTBQ(create)| WTBQ(write)| STATUS| Beam SCHEMA AUTO ERR| Beam WRITE DISPO ERR| BQ Schema ERR| BQ Table Missing ERR
   |--|--|--|--|--|--|--|--|--|--|--|
   041|() → TableRef|DEFAULT|None|IF_NEEDED|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   045|() → TableRef|DEFAULT|None|IF_NEEDED|TRUNCATE| 🛑 | ○ | ○ | 🛑 | ○ 
   037|() → TableRef|DEFAULT|None|NEVER|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   005|() → str|DEFAULT|None|IF_NEEDED|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   009|() → str|DEFAULT|None|IF_NEEDED|TRUNCATE| 🛑 | ○ | ○ | 🛑 | ○ 
   001|() → str|DEFAULT|None|NEVER|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   053|() → TableRef|FILE_LOADS|None|IF_NEEDED|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   057|() → TableRef|FILE_LOADS|None|IF_NEEDED|TRUNCATE| 🛑 | ○ | ○ | 🛑 | ○ 
   049|() → TableRef|FILE_LOADS|None|NEVER|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   017|() → str|FILE_LOADS|None|IF_NEEDED|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   021|() → str|FILE_LOADS|None|IF_NEEDED|TRUNCATE| 🛑 | ○ | ○ | 🛑 | ○ 
   013|() → str|FILE_LOADS|None|NEVER|APPEND| 🛑 | ○ | ○ | 🛑 | ○ 
   065|() → TableRef|STREAMING_INSERTS|None|IF_NEEDED|APPEND| 🟡 | ○ | ○ | ○ | 🛑 
   069|() → TableRef|STREAMING_INSERTS|None|IF_NEEDED|TRUNCATE| 🛑 | ○ | 🛑 | ○ | ○ 
   061|() → TableRef|STREAMING_INSERTS|None|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   029|() → str|STREAMING_INSERTS|None|IF_NEEDED|APPEND| 🟡 | ○ | ○ | ○ | 🛑 
   033|() → str|STREAMING_INSERTS|None|IF_NEEDED|TRUNCATE| 🛑 | ○ | 🛑 | ○ | ○ 
   025|() → str|STREAMING_INSERTS|None|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   040|() → TableRef|DEFAULT|SCHEMA_AUTODETECT|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   044|() → TableRef|DEFAULT|SCHEMA_AUTODETECT|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   036|() → TableRef|DEFAULT|SCHEMA_AUTODETECT|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   004|() → str|DEFAULT|SCHEMA_AUTODETECT|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   008|() → str|DEFAULT|SCHEMA_AUTODETECT|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   000|() → str|DEFAULT|SCHEMA_AUTODETECT|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   052|() → TableRef|FILE_LOADS|SCHEMA_AUTODETECT|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   056|() → TableRef|FILE_LOADS|SCHEMA_AUTODETECT|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   048|() → TableRef|FILE_LOADS|SCHEMA_AUTODETECT|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   016|() → str|FILE_LOADS|SCHEMA_AUTODETECT|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   020|() → str|FILE_LOADS|SCHEMA_AUTODETECT|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   012|() → str|FILE_LOADS|SCHEMA_AUTODETECT|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   064|() → TableRef|STREAMING_INSERTS|SCHEMA_AUTODETECT|IF_NEEDED|APPEND| 🛑 | 🛑 | ○ | ○ | ○ 
   068|() → TableRef|STREAMING_INSERTS|SCHEMA_AUTODETECT|IF_NEEDED|TRUNCATE| 🛑 | 🛑 | ○ | ○ | ○ 
   060|() → TableRef|STREAMING_INSERTS|SCHEMA_AUTODETECT|NEVER|APPEND| 🛑 | 🛑 | ○ | ○ | ○ 
   028|() → str|STREAMING_INSERTS|SCHEMA_AUTODETECT|IF_NEEDED|APPEND| 🛑 | 🛑 | ○ | ○ | ○ 
   032|() → str|STREAMING_INSERTS|SCHEMA_AUTODETECT|IF_NEEDED|TRUNCATE| 🛑 | 🛑 | ○ | ○ | ○ 
   024|() → str|STREAMING_INSERTS|SCHEMA_AUTODETECT|NEVER|APPEND| 🛑 | 🛑 | ○ | ○ | ○ 
   043|() → TableRef|DEFAULT|TableSchema|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   047|() → TableRef|DEFAULT|TableSchema|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   039|() → TableRef|DEFAULT|TableSchema|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   007|() → str|DEFAULT|TableSchema|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   011|() → str|DEFAULT|TableSchema|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   003|() → str|DEFAULT|TableSchema|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   055|() → TableRef|FILE_LOADS|TableSchema|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   059|() → TableRef|FILE_LOADS|TableSchema|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   051|() → TableRef|FILE_LOADS|TableSchema|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   019|() → str|FILE_LOADS|TableSchema|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   023|() → str|FILE_LOADS|TableSchema|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   015|() → str|FILE_LOADS|TableSchema|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   067|() → TableRef|STREAMING_INSERTS|TableSchema|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   071|() → TableRef|STREAMING_INSERTS|TableSchema|IF_NEEDED|TRUNCATE| 🛑 | ○ | 🛑 | ○ | ○ 
   063|() → TableRef|STREAMING_INSERTS|TableSchema|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   031|() → str|STREAMING_INSERTS|TableSchema|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   035|() → str|STREAMING_INSERTS|TableSchema|IF_NEEDED|TRUNCATE| 🛑 | ○ | 🛑 | ○ | ○ 
   027|() → str|STREAMING_INSERTS|TableSchema|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   042|() → TableRef|DEFAULT|str|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   046|() → TableRef|DEFAULT|str|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   038|() → TableRef|DEFAULT|str|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   006|() → str|DEFAULT|str|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   010|() → str|DEFAULT|str|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   002|() → str|DEFAULT|str|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   054|() → TableRef|FILE_LOADS|str|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   058|() → TableRef|FILE_LOADS|str|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   050|() → TableRef|FILE_LOADS|str|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   018|() → str|FILE_LOADS|str|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   022|() → str|FILE_LOADS|str|IF_NEEDED|TRUNCATE| ✅ | ✅ | ✅ | ✅ | ✅ 
   014|() → str|FILE_LOADS|str|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   066|() → TableRef|STREAMING_INSERTS|str|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   070|() → TableRef|STREAMING_INSERTS|str|IF_NEEDED|TRUNCATE| 🛑 | ○ | 🛑 | ○ | ○ 
   062|() → TableRef|STREAMING_INSERTS|str|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   030|() → str|STREAMING_INSERTS|str|IF_NEEDED|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 
   034|() → str|STREAMING_INSERTS|str|IF_NEEDED|TRUNCATE| 🛑 | ○ | 🛑 | ○ | ○ 
   026|() → str|STREAMING_INSERTS|str|NEVER|APPEND| ✅ | ✅ | ✅ | ✅ | ✅ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] stefano-sequence commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
stefano-sequence commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1148247477

   As additional context, this only happens with `method='FILE_LOADS'` (the default for batch pipelines) and if `schema` is not specified. The reason is that the temporary table that data is pre-loaded into has unknown schema


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1341284254

   @waltage @stefano-sequence so is there a bug that we can reproduce, or not so much? (since this is P1 it shows up on alerts whenever it goes a week without an update)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] dabhicusp commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by "dabhicusp (via GitHub)" <gi...@apache.org>.
dabhicusp commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1625165954

   Hello @kennknowles @waltage while inserting data into BQ, data isn't inserted and also `ProjectId` is changed to.
   Table Property that I use while inserting data in BQ is below:: 
            `WTBQ(table) | WTBQ(method) | WTBQ(schema) | WTBQ(create) | WTBQ(write)`
           `string | STREAMING_INSERTS | None | CREATE_NEVER | WRITE_APPEND`
   
   and also same for the `TableSchema` instead of `string`.
   
   The exact error i got is this ::
   `google.api_core.exceptions.NotFound: 404 POST https://bigquery.googleapis.com/bigquery/v2/projects/ee-aniket/datasets/mydataset/tables/sahildabhi1/insertAll?prettyPrint=false: Table 933583868273:mydataset.sahildabhi1 not found. [while running 'Write to BigQuery/_StreamToBigQuery/StreamInsertRows/ParDo(BigQueryWriteFn)']`
   
   The code i use for this is ::
   ```
   import apache_beam as beam
   from google.cloud import bigquery
   
   def create_schema():
       fields = [
       bigquery.SchemaField('column1', 'STRING', mode='NULLABLE'),
       bigquery.SchemaField('column2', 'STRING', mode='NULLABLE')
       ]
       return fields
   
   def get_table_name(element):
           print(element)
           table_name = element['column1']
           table_name = f'ee-aniket.mydataset.{table_name}'
           try:
               table_schema = create_schema()
               table = bigquery.Table(table_name, schema=table_schema)
               table = bigquery.Client().create_table(table)
           except Exception as e:
               raise f"Can't create the table.{e}"
           finally:
               table_name_element =  table_name.split('.')
               return f"{table_name_element[0]}:{table_name_element[1]}.{table_name_element[2]}"
   
   def run_pipeline(pcollection):
       pcollection | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
           table=get_table_name,
           method = 'STREAMING_INSERTS',
           create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
           write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
       )
   
   with beam.Pipeline() as p:
       pcollection = p | beam.Create([{'column1': 'sahildabhi1', 'column2': 'value2'}])
       run_pipeline(pcollection)
   ```
   
   I kindly request your assistance in resolving this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21476: WriteToBigQuery Dynamic table destinations returns wrong tableId

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on issue #21476:
URL: https://github.com/apache/beam/issues/21476#issuecomment-1428716326

   Any updates on this? Did you solve it? The bug is pretty old so I wonder if it is obsolete?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org