You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/04/15 22:55:15 UTC

[GitHub] [beam] pabloem opened a new pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

pabloem opened a new pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433
 
 
   TODOs:
   
   - [ ] Document the changes in CHANGES.md
   - [ ] Ensure Pydoc makes sense
   - [ ] Run PostCommits
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chamikaramj commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614353272
 
 
   LGTM. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614340844
 
 
   3.6 PC: https://builds.apache.org/job/beam_PostCommit_Python36_PR/60/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614343489
 
 
   Run Python 2 PostCommit
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614340906
 
 
   Run Python 2.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614373975
 
 
   It may still be reasonable to merge this, so that we can be 100% sure that results will be consistent... but you're right, Chun, that the tests so far show equal behaviour between avro and json.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614360925
 
 
   Were there some tests you run to notice the data type incompatibilities? Just curious how you spotted them.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614358273
 
 
   Run Python 3.6 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614899392
 
 
   We backported the transform into a library that we ship with our Dataflow jobs, I didn't know that there were Python snapshots :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chunyang edited a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
chunyang edited a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074
 
 
   Hmm interesting, I agree keeping JSON as default is probably the safer bet.
   
   We have seen a case internally where the data provided to WriteToBigQuery is a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate format, the data shows up as a DATE column in BigQuery, but we can't get the same behavior with Avro format without doing one of:
   1. Specifying schema for that column as DATE and modifying the incoming PCollection to use `datetime.date` or
   2. Specifying schema for that column as STRING, in which case it no longer is a DATE column in BigQuery.
   
   The 2nd option is problematic when we're appending to an existing table, in which case we have to modify the pipeline to keep appending to it.
   
   fastavro 0.22.2 allows writing a string type to a column defined as date logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems like Beam pins the fastavro constraint to <0.22, so for now we can't take advantage of that.
   
   I believe your comments in CHANGES are accurate, there are some  date-like and datetime-like strings that will behave differently in Avro vs JSON format.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614357801
 
 
   Gotcha, thanks for the heads up

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem merged pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem merged pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem removed a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem removed a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614340906
 
 
   Run Python 2.7 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614374425
 
 
   Run Python PreCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614320399
 
 
   Run Python 3.6 PostCommit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614372823
 
 
   hmm ufffff - this is truly embarrassing. I inadvertently ran my tests in a branch with changes to the BQ Source. This gave me weird results when running the BigQueryQueryToTable.test_big_query_new_types. I've jsut tested this with both JSON/AVRO alternatives, and the results are the same - as you had clearly verified Chun. It seems like this change is not necessary.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074
 
 
   Hmm interesting, I agree keeping JSON as default is probably the safer bet.
   
   We have seen a case internally where the data provided to WriteToBigQuery is a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate format, the data shows up as a DATE column in BigQuery, but we can't get the same behavior with Avro format without doing one of:
   1. Specifying schema for that column as DATE and modifying the incoming PCollection to use `datetime.date` or
   2. Specifying schema for that column as STRING, in which case it no longer is a DATE column in BigQuery.
   
   The 2nd option is problematic when we're appending to an existing table, in which case we have to modify the pipeline to keep appending to it.
   
   fastavro 0.22.2 allows writing a string type to a column defined as date logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems like Beam pins the fastavro constraint to <0.22, so for not we can't take advantage of that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on a change in pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#discussion_r409736653
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py
 ##########
 @@ -176,7 +176,7 @@ def test_many_files(self):
     file length is very small, so only a couple records fit in each file.
     """
 
-    fn = bqfl.WriteRecordsToFile(schema=_ELEMENTS_SCHEMA, max_file_size=300)
+    fn = bqfl.WriteRecordsToFile(schema=_ELEMENTS_SCHEMA, max_file_size=50)
 
 Review comment:
   Why change these? (Not opposed just curious for the record)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem removed a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem removed a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614343489
 
 
   Run Python 2 PostCommit
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614839093
 
 
   @chunyang I reviewed the releases, and I saw that this is not in 2.20 - so I'm curious. Do you guys use Beam snapshots?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614349385
 
 
   All tests have passed. I'm now adding change docs to CHANGES.md

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] pabloem commented on a change in pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink

Posted by GitBox <gi...@apache.org>.
pabloem commented on a change in pull request #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#discussion_r409783068
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py
 ##########
 @@ -176,7 +176,7 @@ def test_many_files(self):
     file length is very small, so only a couple records fit in each file.
     """
 
-    fn = bqfl.WriteRecordsToFile(schema=_ELEMENTS_SCHEMA, max_file_size=300)
+    fn = bqfl.WriteRecordsToFile(schema=_ELEMENTS_SCHEMA, max_file_size=50)
 
 Review comment:
   50 and 300 are the file sizes that force the transform to spill out after a couple elements in JSON and AVRO respectively.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services