You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by GitBox <gi...@apache.org> on 2020/03/09 22:03:47 UTC

[GitHub] [beam] pabloem opened a new pull request #11086: Make custom BQ source read from Avro

pabloem opened a new pull request #11086: Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086
 
 
   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r398232108
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   Bytes treatment should be called out in IO doc, we do mention it: https://github.com/apache/beam/blob/8bc2880cca40c00a96623b3ce96ea0b856af76c9/sdks/python/apache_beam/io/gcp/bigquery.py#L227. 
   
   b64encoding may be unnecessary, and less efficient. I think the reason native IO encodes bytes with b64 is because that was the behavior in Java SDK. We can argue that's not necessary. However I am concerned about the consistency of the UX here. Different UX for two transforms means the transforms will not be interchangeable, and users might overlook this.  This might also cause friction in cross-language pipelines.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-598566660
 
 
   Run Python 2 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r406968173
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -609,7 +611,8 @@ def __init__(
       coder=None,
       use_standard_sql=False,
       flatten_results=True,
-      kms_key=None):
+      kms_key=None,
+      backwards_compatible_data_format=None):
 
 Review comment:
   1. Can we use a bool as a default value instead of `None`?
   2. Can we use rename this flag to something that describes the course of action, for example: `use_json_data_representation=False` - you can come up with a better name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597911424
 
 
   > * For bytes, AvroSource is currently returning base64 encoded bytes. I am trying to figure out how to handle this. For py2, strings and bytes may not be differentiable, so we may not be able to handle this properly...
   Do you have a summary of Py2 vs Py3  x Current Behavior vs New behavior? What is returned on Py3 - is it also base64 encoded bytes?
   This is a prior discussion on Py3 migration issues in Bigquery: https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455%40%3Cdev.beam.apache.org%3E. It took us some time to understand what was going on (https://lists.apache.org/thread.html/0c2178cf8e5d9e77c4f233f05a0b87b6011a1daa1a5ae47b41463af5%40%3Cdev.beam.apache.org%3E)  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r404708336
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -663,14 +662,10 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None):
         self._setup_temporary_dataset(bq)
         self.table_reference = self._execute_query(bq)
 
-      schema, metadata_list = self._export_files(bq)
+      unused_schema, metadata_list = self._export_files(bq)
 
 Review comment:
   I see. Then let's wait for the decision.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r404483110
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -663,14 +662,10 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None):
         self._setup_temporary_dataset(bq)
         self.table_reference = self._execute_query(bq)
 
-      schema, metadata_list = self._export_files(bq)
+      unused_schema, metadata_list = self._export_files(bq)
 
 Review comment:
   We may do that, but if we end up keeping a backwards compatibility flag, we'll need to keep the coder as-is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-615545309
 
 
   Run Python 3.5 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-612179027
 
 
   Precommit failures unrelated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409025802
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -610,7 +611,8 @@ def __init__(
       coder=None,
       use_standard_sql=False,
       flatten_results=True,
-      kms_key=None):
+      kms_key=None,
+      use_json_exports=False):
 
 Review comment:
   Is there a reason to continue supporting JSON for the new transform (once we get our current performance issues) ? Probably we should document why users may choose JSON over Avro if we want to support both.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r406931472
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   If the changes are reasonable, I'll add to release notes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-596801745
 
 
   Run Python 3.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-611816596
 
 
   Run Python 3.5 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r396637263
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -663,14 +662,10 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None):
         self._setup_temporary_dataset(bq)
         self.table_reference = self._execute_query(bq)
 
-      schema, metadata_list = self._export_files(bq)
+      unused_schema, metadata_list = self._export_files(bq)
 
 Review comment:
   Does it mean that coder is no longer needed for `_CustomBigQuerySource`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r395910855
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   The behavior will be different for different transforms. Users will need to explicitly change the transform in their code. We can make them aware of the new transform, and its typing differences in release notes, and possibly in Pydoc for the new transform as well. Thoughts? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r405014860
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   So the proposal right now is: To have a parameter for backwards compatibility in the transform, but by default return Avro-returned types. We will need to have clear Pydoc on _ReadFromBigQuery, and on CHANGES.md.
   @chamikaramj thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn edited a comment on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn edited a comment on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597911424
 
 
   > * For bytes, AvroSource is currently returning base64 encoded bytes. I am trying to figure out how to handle this. For py2, strings and bytes may not be differentiable, so we may not be able to handle this properly...
   
   Do you have a summary of Py2 vs Py3  x Current Behavior vs New behavior? What is returned on Py3 - is it also base64 encoded bytes?
   This is a prior discussion on Py3 migration issues in Bigquery: https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455%40%3Cdev.beam.apache.org%3E. It took us some time to understand what was going on (https://lists.apache.org/thread.html/0c2178cf8e5d9e77c4f233f05a0b87b6011a1daa1a5ae47b41463af5%40%3Cdev.beam.apache.org%3E)  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-612174579
 
 
   Lint failures unrelated

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597360911
 
 
   Run Python2_PVR_Flink PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-615545058
 
 
   Run Python2_PVR_Flink PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409021250
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -1570,6 +1584,10 @@ class _ReadFromBigQuery(PTransform):
       bucket where the extracted table should be written as a string or
       a :class:`~apache_beam.options.value_provider.ValueProvider`. If
       :data:`None`, then the temp_location parameter is used.
+    backwards_compatible_data_format (bool): By default, this transform reads
+      data in native Python types that come from Avro-exports of BigQuery. By
+      setting this flag to True, the transform will return JSON-types, like
+      the older BigQuerySource.
 
 Review comment:
   I've added a link to https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions for details on the conversions. I've also rephrased the doc. LMK what you think.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597358632
 
 
   Run Python 3.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-598566638
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-610614521
 
 
   Run Python 3.5 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597374023
 
 
   Run Python 2 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-615517971
 
 
   @chamikaramj @tvalentyn @kamilwu 
   I've documented the data types that are different, and provided links to BQ documentation to see differences in representation. I've also addressed Cham's comments. I think it's worth keeping JSON export capability.
   
   Finally, I've added the change to CHANGES.md. This should help us move the source forward towards full support. Thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r395422570
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   How intuitive will this behavior be to users? should't the output be consistent? 
   If we decide to keep it, how will this be documented?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597809538
 
 
   Run Python 3.7 PostCommit
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-612153552
 
 
   Passing py35 pc: https://builds.apache.org/job/beam_PostCommit_Python35_PR/118/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r404000153
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -663,14 +662,10 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None):
         self._setup_temporary_dataset(bq)
         self.table_reference = self._execute_query(bq)
 
-      schema, metadata_list = self._export_files(bq)
+      unused_schema, metadata_list = self._export_files(bq)
 
 Review comment:
   This is great! That would mean that we can safely remove `_JsonToDictCoder`. Do you think you can do this as a part of this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
kamilwu commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r396638710
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -69,6 +70,20 @@ def wrapped(self):
   return inner
 
 
+def datetime_to_utc(element):
+  for k, v in element.items():
+    if isinstance(v, datetime.time):
 
 Review comment:
   We can combine this condition with the one below:
   `if isinstance(v, datetime.time) or isinstance(v, datetime.date):`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597365946
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-610707540
 
 
   Run Python 3.5 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597919814
 
 
   Run Python 3.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409024858
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -669,6 +681,13 @@ def estimate_size(self):
       # no access to the query that we're running.
       return None
 
+  def _create_source(self, path, schema):
+    if not self.use_json_exports:
+      return create_avro_source(path, 0, True, use_fastavro=True)
 
 Review comment:
   Probably cleaner to use the "param_name=value" format here for keyword arguments and not specify anything when we just want the default.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597809470
 
 
   Run Python 2 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r403337969
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -663,14 +662,10 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None):
         self._setup_temporary_dataset(bq)
         self.table_reference = self._execute_query(bq)
 
-      schema, metadata_list = self._export_files(bq)
+      unused_schema, metadata_list = self._export_files(bq)
 
 Review comment:
   You're correct!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-610630006
 
 
   Py35 postcommit run: https://builds.apache.org/job/beam_PostCommit_Python35_PR/114/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409090101
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -610,7 +611,8 @@ def __init__(
       coder=None,
       use_standard_sql=False,
       flatten_results=True,
-      kms_key=None):
+      kms_key=None,
+      use_json_exports=False):
 
 Review comment:
   not supporting JSON would increase the amount of changes that users need to make to migrate to the new source, as the data output types would change. I don't think this would be ideal. Wouldn't it be better to allow users to migrate to the new source without having to change the rest of their pipeline?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409031301
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   I think having a single "backwards compatibility" parameter can be confusing and error prone. We might still be backwards incompatible in some unknown ways. I think it's better to document exact differences between the transforms and give clear documentation/guidelines for anyone who is upgrading.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r404218120
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   Having a parameter for backwards-compatibility would make sense to me, but will defer to @chamikaramj  on this. Since new transform comes with a new name, whoever will be using it, will have to read the API, so will be aware of the issue. But that would mean we add a knob, potentially indefinitely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-601496064
 
 
   @kamilwu @chamikaramj PTAL : ) -- see first comment for analysis on typing, and changes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-598827888
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597900858
 
 
   @chamikaramj @tvalentyn 
   
   Hey there, I'm working on this improvement. There's some interesting outcomes here:
   
   - For datetime types (date,time,datetime,timestamp), AvroSource returns python types (datetime.date, datetime.time,datetime.datetime, datetime.datetime) - along with timezone info. I feel that in this case, Python types are the most desirable. Thoughts? _CustomBQSource is experimental, so I wouldn't think it's subject to backwards compatibility concerns.
   
   - For bytes, AvroSource is currently returning base64 encoded bytes. I am trying to figure out how to handle this. For py2, strings and bytes may not be differentiable, so we may not be able to handle this properly...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409021058
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -609,7 +611,8 @@ def __init__(
       coder=None,
       use_standard_sql=False,
       flatten_results=True,
-      kms_key=None):
+      kms_key=None,
+      backwards_compatible_data_format=None):
 
 Review comment:
   1. Done.
   2. I've called it `use_json_exports` - does that help?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-610683435
 
 
   Run Python 3.5 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r403338432
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   I think that's reasonable. But I'd think that after slight confusion, users would figure out the differences.
   
   Another option is to provide a parameter to the transform to have backwards-compatible types at a performance cost.
   
   @chamikaramj @tvalentyn  wdyt?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-598430931
 
 
   Run Python 3.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r406931239
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   @chamikaramj @tvalentyn @kamilwu - added a backward compatibility flag. Can you review?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-597919884
 
 
   Run Python 2 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-598296019
 
 
   Run Python 3.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-611708075
 
 
   Run Python 3.5 PostCommit
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-611724831
 
 
   Run Python 3.5 PostCommit
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-615518002
 
 
   Run Python 3.5 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409022421
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -732,11 +748,20 @@ def _export_files(self, bq):
       bigquery.TableSchema instance, a list of FileMetadata instances
     """
     job_id = uuid.uuid4().hex
-    job_ref = bq.perform_extract_job([self.gcs_location],
-                                     job_id,
-                                     self.table_reference,
-                                     bigquery_tools.FileFormat.JSON,
-                                     include_header=False)
+    if self.use_json_exports:
+      job_ref = bq.perform_extract_job([self.gcs_location],
+                                       job_id,
+                                       self.table_reference,
+                                       bigquery_tools.FileFormat.JSON,
+                                       include_header=False)
+    else:
+      job_ref = bq.perform_extract_job([self.gcs_location],
 
 Review comment:
   Let's document subtle differences between the two types (and two transforms ?) in doc comments of _ReadFromBigQuery so that users who upgrade/change the BigQuery read transforms have clear guidelines for updating their pipelines.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r409024966
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -669,6 +681,13 @@ def estimate_size(self):
       # no access to the query that we're running.
       return None
 
+  def _create_source(self, path, schema):
+    if not self.use_json_exports:
+      return create_avro_source(path, 0, True, use_fastavro=True)
+    else:
+      return TextSource(
+          path, 0, CompressionTypes.UNCOMPRESSED, True, self.coder(schema))
 
 Review comment:
   Ditto.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r406969613
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -1570,6 +1584,10 @@ class _ReadFromBigQuery(PTransform):
       bucket where the extracted table should be written as a string or
       a :class:`~apache_beam.options.value_provider.ValueProvider`. If
       :data:`None`, then the temp_location parameter is used.
+    backwards_compatible_data_format (bool): By default, this transform reads
+      data in native Python types that come from Avro-exports of BigQuery. By
+      setting this flag to True, the transform will return JSON-types, like
+      the older BigQuerySource.
 
 Review comment:
   1. Do all BQ types have a native python type?
   2. What are JSON-types? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-615553086
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services