You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chun Yang (Jira)" <ji...@apache.org> on 2019/11/27 19:23:00 UTC

[jira] [Updated] (BEAM-8841) Add ability to perform BigQuery file loads using avro

     [ https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chun Yang updated BEAM-8841:
----------------------------
    Description: 
Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.

BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (BEAM-2879) so it makes sense to support it in the Python SDK as well.

The change will be somewhere around [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].

  was:
Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.

BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (BEAM-2879) so it makes sense to support it in the Python SDK as well.


> Add ability to perform BigQuery file loads using avro
> -----------------------------------------------------
>
>                 Key: BEAM-8841
>                 URL: https://issues.apache.org/jira/browse/BEAM-8841
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-gcp
>            Reporter: Chun Yang
>            Priority: Minor
>
> Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)