You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kaxil Naik (Jira)" <ji...@apache.org> on 2020/02/23 13:04:00 UTC
[jira] [Created] (AIRFLOW-6891) GCS to BQ operator fails when JSON
is the source format
Kaxil Naik created AIRFLOW-6891:
-----------------------------------
Summary: GCS to BQ operator fails when JSON is the source format
Key: AIRFLOW-6891
URL: https://issues.apache.org/jira/browse/AIRFLOW-6891
Project: Apache Airflow
Issue Type: Bug
Components: gcp
Affects Versions: 1.10.9
Reporter: Kaxil Naik
Assignee: Kaxil Naik
From https://stackoverflow.com/questions/60358764/airflow-gcs-to-bq-operator-fails-when-json-is-the-source-format
I have a GoogleCloudStorageToBigQueryOperator operator running on airflow in a dag. It works perfect when working CSV files... I am now trying to ingest a JSON file, and I'm receiving errors: such like:
*skipLeadingRows* is not a valid src_fmt_configs for type *NEWLINE_DELIMITED_JSON*
The weird thing is that I'm not calling *skipLeadingRows* in my calling. as below:
{noformat}
load_Users_to_GBQ = GoogleCloudStorageToBigQueryOperator(
task_id='Table1_GCS_to_GBQ',
bucket='bucket1',
source_objects=['table*.json'],
source_format='NEWLINE_DELIMITED_JSON',
destination_project_dataset_table='DB.table1',
autodetect=False,
schema_fields=[
{'name': 'fieldid', 'type': 'integer', 'mode': 'NULLABLE'},
{'name': 'filed2', 'type': 'integer', 'mode': 'NULLABLE'},
{'name': 'field3', 'type': 'string', 'mode': 'NULLABLE'},
{'name': 'field4', 'type': 'string', 'mode': 'NULLABLE'},
{'name': 'field5', 'type': 'string', 'mode': 'NULLABLE'}
],
write_disposition='WRITE_TRUNCATE',
google_cloud_storage_conn_id='Conn1',
bigquery_conn_id='Conn1',
dag=dag)
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)