You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (JIRA)" <ji...@apache.org> on 2019/04/29 18:04:00 UTC

[jira] [Created] (BEAM-7173) Bigquery connector should not enable schema without a user explicitly instructing to do so.

Valentyn Tymofieiev created BEAM-7173:
-----------------------------------------

             Summary: Bigquery connector should not enable schema without a user explicitly instructing to do so. 
                 Key: BEAM-7173
                 URL: https://issues.apache.org/jira/browse/BEAM-7173
             Project: Beam
          Issue Type: Bug
          Components: io-python-gcp
            Reporter: Valentyn Tymofieiev
            Assignee: Pablo Estrada


Currently BQ_FILE_LOADS insertion method enables schema autodetection: [https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340]

 It may be more user-friendly allow users to opt-in for schema autodetection in their pipelines across all use-cases for BQ connector. Schema autodetection is an approximation, and does not always work.

For example, schema autodetection cannot infer whether a string data is binary bytes or textual string, and will always prefer the latter. If schema autodetection is enabled by default, users who need to write 'bytes' data will always have to specify a schema, even when writing to a table that was already created and has the schema. Otherwise autodetected schema will try to write 'string' entry into a 'bytes' field and the write will fail.

Related discussion: [https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)