You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "gecko655 (Jira)" <ji...@apache.org> on 2019/11/23 08:25:00 UTC

[jira] [Commented] (AIRFLOW-6050) Missing an argument `null_marker` in GoogleCloudStorageToBigQueryOperator

    [ https://issues.apache.org/jira/browse/AIRFLOW-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980715#comment-16980715 ] 

gecko655 commented on AIRFLOW-6050:
-----------------------------------

Uh...

I found that I can use `null_marker` feature in GoogleCloudStorageToBigQueryOperator by defining the operator like:
 
{code:python}
GoogleCloudStorageToBigQueryOperator(
  'bucket',
  ['source_object'],
  'destination_project_dataset.table',
  src_fmt_configs={
    'nullMarker': 'null'
  },
)
{code}

> Missing an argument `null_marker` in GoogleCloudStorageToBigQueryOperator
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6050
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6050
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks, operators
>    Affects Versions: 1.10.3
>            Reporter: gecko655
>            Priority: Major
>
> h1. Summary
> We need the `null_marker` argument in GoogleCloudStorageToBigQueryOperator.
> The spec of his argument is documented here: https://cloud.google.com/bigquery/docs/reference/rest/v2/Job?hl=ja#jobconfigurationload
> The related implementation is here:
> https://github.com/apache/airflow/blob/09ccf296fc0595be8a0bb5802eb2df5d2948889b/airflow/operators/gcs_to_bq.py#L33
> h1. Situation and reproduce
> I could not load a CSV file to BigQuery table because the file contains `'null'` column in `timestamp` type table schema.
> We can avoid this by specifying `null_marker` option.
> Suppose we have a CSV file like:
> {code:c}
> start_time,end_time
> '2019-11-23 16:49:00',null
> {code}
> and a schema definition like:
> {code:json}
>   {
>     "mode": "NULLABLE", 
>     "name": "start_time", 
>     "type": "TIMESTAMP"
>   }, 
>   {
>     "mode": "NULLABLE", 
>     "name": "end_time", 
>     "type": "TIMESTAMP"
>   }
> }
> {code}
> By running GoogleCloudStorageToBigQueryOperator in this situation, we get an error like:
> bq. Could not parse 'null' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]; Could not parse 'null' as datetime for field end_time
> Without Airflow GoogleCloudStorageToBigQueryOperator, we can run this process manually with the option `--null_marker='null'` .
> h1. Related issues
> https://issues.apache.org/jira/browse/AIRFLOW-5224



--
This message was sent by Atlassian Jira
(v8.3.4#803005)