You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kaxil Naik (Jira)" <ji...@apache.org> on 2019/09/19 10:40:00 UTC

[jira] [Commented] (AIRFLOW-5224) gcs_to_bq.GoogleCloudStorageToBigQueryOperator - Specify Encoding for BQ ingestion

    [ https://issues.apache.org/jira/browse/AIRFLOW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933260#comment-16933260 ] 

Kaxil Naik commented on AIRFLOW-5224:
-------------------------------------

Sure, we can add this in our next release.

 
Can you send us the relevant link about this on
[https://cloud.google.com/bigquery/docs/loading-data#characterencodings]

>If you don't specify an encoding, or explicitly specify that your data is UTF-8 but then provide a CSV file that is not UTF-8 encoded, BigQuery attempts to convert your CSV file to UTF-8.

Shouldn't this work?

 

 

> gcs_to_bq.GoogleCloudStorageToBigQueryOperator - Specify Encoding for BQ ingestion
> ----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5224
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5224
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, gcp
>    Affects Versions: 1.10.0
>         Environment: airflow software platform
>            Reporter: Anand Kumar
>            Priority: Blocker
>
> Hi,
> The current business project we are enabling has been built completely on GCP components with composer with airflow being one of the key process. We have built various data pipelines using airflow for multiple work-streams where data is being ingested from gcs bucket to Big query.
> Based on the recent updates on Google BQ infra end, there seems to be some tightened validations on UTF-8 characters which has resulted in mutiple failures of our existing business process.
> On further analysis we found out that while ingesting data to BQ from a Google bucket the encoding needs to be explicitly specified going forward but the below operator currently doesn't  supply any params to specify explicit encoding
> _*gcs_to_bq.GoogleCloudStorageToBigQueryOperator*_
>  Could someone please treat this as a priority and help us with a fix to bring us back in BAU mode
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)