You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/24 18:07:35 UTC

[GitHub] [airflow] shahkshitij15 opened a new issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

shahkshitij15 opened a new issue #21801:
URL: https://github.com/apache/airflow/issues/21801


   ### Apache Airflow version
   
   2.2.4 (latest released)
   
   ### What happened
   
   I was trying to create an external table for a CSV file in GCS using the GCSToBigQueryOperator with autodetect=True but ran into some issues. The error stated that either schema field or schema object must be mentioned for creating an external table configuration. On close inspection of the code, I found out that the operator cannot autodetect the schema of the file.
   
   In the [file](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py), a piece of code seems to be missing when calling the create_external_table function at line 262.
   
   This must be an oversight but it **prevents the creation of an external table with an automatically deduced schema.**
   
   The **solution** is to pass autodetect=self.autodetect when calling the create_external_table function as mentioned below:
   if self.external_table:
       [...]
        autodetect=self.autodetect,
       [...]
   
   ### What you expected to happen
   
   The operator should have autodetected the schema of the CSV file and created an external table but it threw an error stating that either schema field or schema object must be mentioned for creating external table configuration
   
   This error is due to the fact that the value of autodetect is not being passed when calling the create_external_table function in this [file](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py) at line 262. Also, the default value of autodetect is False in create_external_table and so one gets the error as the function receives neither autodetect, schema_field or schema_object value
   
   ### How to reproduce
   
   The above issue can be reproduced by calling the GCSToBigQueryOperator with the following parameters as follow:
   
   create_external_table = GCSToBigQueryOperator(
          task_id = <task_id>
          bucket = <bucket_name>,
          source_objects = [<gcs path excluding bucket name to csv file>],
          destination_project_dataset_table = <project_id>.<dataset_name>.<table_name>,
          schema_fields=None,
          schema_object=None,
          source_format='CSV',
          autodetect = True,
          external_table=True, 
         dag = dag
   )
   
   create_external_table
   
   ### Operating System
   
   macOS Monterey 12.2.1
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shahkshitij15 edited a comment on issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

Posted by GitBox <gi...@apache.org>.
shahkshitij15 edited a comment on issue #21801:
URL: https://github.com/apache/airflow/issues/21801#issuecomment-1054284346


   Hey! I have added the necessary code to resolve the issue. How shall I test my code before raising a PR? Or shall I raise the PR nonetheless?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21801:
URL: https://github.com/apache/airflow/issues/21801#issuecomment-1050121641


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shahkshitij15 closed issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

Posted by GitBox <gi...@apache.org>.
shahkshitij15 closed issue #21801:
URL: https://github.com/apache/airflow/issues/21801


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shahkshitij15 commented on issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

Posted by GitBox <gi...@apache.org>.
shahkshitij15 commented on issue #21801:
URL: https://github.com/apache/airflow/issues/21801#issuecomment-1054284346


   Hey! I have added the necessary code to resolve the issue. How shall I test my code before raising a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21801:
URL: https://github.com/apache/airflow/issues/21801#issuecomment-1053641263


   Feel free to fix it - assigned you :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shahkshitij15 commented on issue #21801: Error in creating external table using GCSToBigQueryOperator when autodetect=True

Posted by GitBox <gi...@apache.org>.
shahkshitij15 commented on issue #21801:
URL: https://github.com/apache/airflow/issues/21801#issuecomment-1059961337


   The issue has been resolved. Thank you for your support :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org