You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (JIRA)" <ji...@apache.org> on 2019/04/30 02:15:00 UTC

[jira] [Updated] (BEAM-7173) Bigquery connector should not enable schema autodetection without a user explicitly instructing to do so.

     [ https://issues.apache.org/jira/browse/BEAM-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Valentyn Tymofieiev updated BEAM-7173:
--------------------------------------
    Summary: Bigquery connector should not enable schema autodetection without a user explicitly instructing to do so.   (was: Bigquery connector should not enable schema without a user explicitly instructing to do so. )

> Bigquery connector should not enable schema autodetection without a user explicitly instructing to do so. 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7173
>                 URL: https://issues.apache.org/jira/browse/BEAM-7173
>             Project: Beam
>          Issue Type: Bug
>          Components: io-python-gcp
>            Reporter: Valentyn Tymofieiev
>            Assignee: Pablo Estrada
>            Priority: Major
>
> Currently BQ_FILE_LOADS insertion method enables schema autodetection: [https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340]
>  It may be more user-friendly allow users to opt-in for schema autodetection in their pipelines across all use-cases for BQ connector. Schema autodetection is an approximation, and does not always work.
> For example, schema autodetection cannot infer whether a string data is binary bytes or textual string, and will always prefer the latter. If schema autodetection is enabled by default, users who need to write 'bytes' data will always have to specify a schema, even when writing to a table that was already created and has the schema. Otherwise autodetected schema will try to write 'string' entry into a 'bytes' field and the write will fail.
> Related discussion: [https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)