You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kaxil Naik (JIRA)" <ji...@apache.org> on 2018/01/31 20:54:00 UTC

[jira] [Created] (AIRFLOW-2053) Change quote_character condition in BigQuery

Kaxil Naik created AIRFLOW-2053:
-----------------------------------

             Summary: Change quote_character condition in BigQuery
                 Key: AIRFLOW-2053
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2053
             Project: Apache Airflow
          Issue Type: Bug
          Components: gcp
    Affects Versions: 1.9.0, 1.8.2
            Reporter: Kaxil Naik
            Assignee: Kaxil Naik
             Fix For: Airflow 2.0, 2.0.0


The BigQuery API states [here|https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs] that :
{quote}The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ('"'). If your data does not contain quoted sections, set the property value to an empty string. {quote}

But the [current implementation|https://github.com/apache/incubator-airflow/blob/6ee4bbd4b1bc4b3f275f7946e2bcdd123970e2dd/airflow/contrib/hooks/bigquery_hook.py#L802] `run_load ` in BigQuery hook has incorrect check to include `quote_character`.

The code currently is:

{code:python}
        if 'fieldDelimiter' not in src_fmt_configs:
            src_fmt_configs['fieldDelimiter'] = field_delimiter
        if quote_character:
            src_fmt_configs['quote'] = quote_character
        if allow_quoted_newlines:
            src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines
{code}

If my data doesn't have quote characters as per BQ API docs I need to put `quote=''` i.e empty string. The above condition `if quote_character:` will return false for an empty string. Hence, I get the following error:

```
{'message': 'Error detected while parsing row starting at position: 0. Error: Data between close double quote (") and field separator.', 'reason': 'invalid'}
```
So, the condition should be :

{code:python}
        if quote_character is not None:
            src_fmt_configs['quote'] = quote_character
{code}
 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)