You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2016/11/04 18:20:58 UTC

[jira] [Commented] (AIRFLOW-611) BigQuery Hooks and Operators "source_format" error

    [ https://issues.apache.org/jira/browse/AIRFLOW-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637240#comment-15637240 ] 

Chris Riccomini commented on AIRFLOW-611:
-----------------------------------------

I think a simple patch that validates what's passed in, and raises an exception if it's not acceptable, would be the way to go.

> BigQuery Hooks and Operators "source_format" error
> --------------------------------------------------
>
>                 Key: AIRFLOW-611
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-611
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Giovanni Briggs
>            Priority: Minor
>
> Found an issue with the *source_format* parameter for the GoogleCloudStorageToBigQueryOperator.
> I was trying to upload a JSON file from GCS to BQ and was using the value *"JSON"* for *source_format*, assuming that this would work.  The upload process started, but then came back with an error saying:
> {code:javascript}
> {'message': 'Error detected while parsing row starting at position: 0. Error: Data between close double quote (") and field separator.', 'reason': 'invalid'}
> {code}
> There is nothing wrong with the JSON format of the doc, so I went and looked at the job description on BigQuery and saw that there was no "Source Format" entry.  When I've successfully uploaded CSV files, the "Source Format" entry is present and says "CSV."
> According to Google's docs for [source format |https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query.tableDefinitions.(key).sourceFormat], acceptable values are: "CSV", "NEWLINE_DELIMTED_JSON", "AVRO" and "GOOGLE_SHEETS."  However, BigQuery doesn't raise an error if you pass a format not represented in that list (such as "JSON").  Instead, it looks like BigQuery assumes you mean CSV and tries to parse the file as a CSV file which results in a completely different error.
> Not sure what the appropriate fix is (or if there even is one).  At least having some additional documentation for the BigQuery hook and operators that points to the list of available values would be helpful.  Otherwise, BigQuery's error leads you to believe that there is something wrong with the format of your data which is different than having something wrong with the setup of the API call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)