You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/28 13:22:27 UTC

[GitHub] [airflow] shaneikennedy opened a new issue #11911: BigQuery support for create or replace table

shaneikennedy opened a new issue #11911:
URL: https://github.com/apache/airflow/issues/11911


   **Description**
   
   BigQuery supports multiple [create table statements](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_statement), one of which is create or replace. 
   
   **Use case / motivation**
   
   This would be really nice for batch processing because I can write a DAG that is: `create-table >> insert-data` and the operation is idempotent. Right now, the BiqQueryCreateEmptyTable operator fails if the table already exists, which means my dag needs some logic to see if I should actually run this operator.
   
   
   **Related Issues**
   
   Not that I could find
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #11911: BigQuery support for create or replace table

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #11911:
URL: https://github.com/apache/airflow/issues/11911#issuecomment-718781510


   CC: @turbaszek 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kumarvt edited a comment on issue #11911: BigQuery support for create or replace table

Posted by GitBox <gi...@apache.org>.
kumarvt edited a comment on issue #11911:
URL: https://github.com/apache/airflow/issues/11911#issuecomment-732336724


   I believe the "CREATE TABLE IF NOT EXISTS" is not implemented in this operator and think it will be very useful to have, whenever we need to create the target table only first time using JSON schema (that has several columns as required). I looked at the create table as select but that changes the mode  of all columns to NULLABLE in target table.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] manesioz commented on issue #11911: BigQuery support for create or replace table

Posted by GitBox <gi...@apache.org>.
manesioz commented on issue #11911:
URL: https://github.com/apache/airflow/issues/11911#issuecomment-718057825


   I agree this functionality would be super useful! I would love to hear what you and the maintainers think about how to design this. 
   
   It seems like we have to consider how we deal with the case when we wish to create a table when a table by the same name already exists in the dataset. We could: 
   
   1) Fail. This is consistent with the `CREATE TABLE IF NOT EXISTS` DDL, and it seems like the current `BigQueryCreateEmptyTableOperator` works like this as well. 
   
   2) Overwrite the existing table with the new table. This is consistent with the `CREATE OR REPLACE TABLE` DDL. This would be a new feature, and could possibly be added as a boolean option in the existing operator, or a completely new operator altogether. 
   
   Example: 
   
   ```python
   create_table = BigQueryCreateTableOperator(
       task_id="create_table",
       dataset_id=DATASET_NAME,
       table_id="test_table",
       replace=True, # this will create AND replace any existing tables with the same name in the same dataset 
       schema_fields=[
           {"name": "emp_name", "type": "STRING", "mode": "REQUIRED"},
           {"name": "salary", "type": "INTEGER", "mode": "NULLABLE"},
       ],
   )
   ```
   If `replace=False` then it should fail if a table exists by the same name in the same dataset. 
   
   Am I missing something? What are everyones thoughts? I'd love to work on a PR for this if possible. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kumarvt commented on issue #11911: BigQuery support for create or replace table

Posted by GitBox <gi...@apache.org>.
kumarvt commented on issue #11911:
URL: https://github.com/apache/airflow/issues/11911#issuecomment-732336724


   I believe the "CREATE TABLE IF NOT EXISTS" is not implemented in this operator and I believe that will be very useful, whenever we need to create the target table only first time using JSON schema (that has several columns as required). I looked at the create table as select but that changes the mode  of all columns to NULLABLE in target table.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org