You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "robertwb (via GitHub)" <gi...@apache.org> on 2023/08/25 23:02:49 UTC

[GitHub] [beam] robertwb opened a new pull request, #28169: [YAML] Implement a consistent BigQuery read and write transform.

robertwb opened a new pull request, #28169:
URL: https://github.com/apache/beam/pull/28169

   This adapts the Java and Python BigQuery IOs to have the same interface and be callable from Yaml.
   
   Several bugfixes were required along the way to make this work smoothly which are divided out in the commit history. 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Polber commented on a diff in pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "Polber (via GitHub)" <gi...@apache.org>.
Polber commented on code in PR #28169:
URL: https://github.com/apache/beam/pull/28169#discussion_r1319028874


##########
sdks/python/apache_beam/yaml/standard_io.yaml:
##########
@@ -0,0 +1,53 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This file enumerates the various IOs that are available by default as
+# top-level transforms in Beam's YAML.
+#
+# Note that there may be redundant implementations. In these cases the specs
+# should be kept in sync.
+# TODO(yaml): See if this can be enforced programmatically.
+
+- type: renaming
+  transforms:
+    'ReadFromBigQuery': 'ReadFromBigQuery'
+    'WriteToBigQuery': 'WriteToBigQuery'
+  config:
+    mappings:
+      'ReadFromBigQuery':
+        query: 'query'
+        table: 'tableSpec'
+        fields: 'selectedFields'
+        row_restriction: 'rowRestriction'
+      'WriteToBigQuery':
+        table: 'table'
+        create_disposition: 'createDisposition'
+        write_disposition: 'writeDisposition'
+        error_handling: 'errorHandling'
+    underlying_provider:
+      type: beamJar
+      transforms:
+        'ReadFromBigQuery': 'beam:schematransform:org.apache.beam:bigquery_storage_read:v1'
+        'WriteToBigQuery': 'beam:schematransform:org.apache.beam:bigquery_storage_write:v1'

Review Comment:
   Should this be `v2`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #28169:
URL: https://github.com/apache/beam/pull/28169#issuecomment-1694019730

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #28169:
URL: https://github.com/apache/beam/pull/28169#issuecomment-1711948332

   Run Python_Integration PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #28169:
URL: https://github.com/apache/beam/pull/28169#issuecomment-1694019072

   R: @Polber


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb merged pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb merged PR #28169:
URL: https://github.com/apache/beam/pull/28169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #28169:
URL: https://github.com/apache/beam/pull/28169#issuecomment-1694070694

   ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#28169](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (9e8ef0c) into [master](https://app.codecov.io/gh/apache/beam/commit/fdf3dfc57b9e17277fdc6a5ba2808fa80787e893?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (fdf3dfc) will **decrease** coverage by `0.03%`.
   > Report is 2 commits behind head on master.
   > The diff coverage is `53.50%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #28169      +/-   ##
   ==========================================
   - Coverage   72.30%   72.28%   -0.03%     
   ==========================================
     Files         678      679       +1     
     Lines       99799    99925     +126     
   ==========================================
   + Hits        72163    72229      +66     
   - Misses      26074    26134      +60     
     Partials     1562     1562              
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/beam/pull/28169/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [python](https://app.codecov.io/gh/apache/beam/pull/28169/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `82.79% <53.50%> (-0.07%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/yaml/main.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC9tYWluLnB5) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `69.54% <17.24%> (-0.81%)` | :arrow_down: |
   | [sdks/python/apache\_beam/yaml/yaml\_io.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC95YW1sX2lvLnB5) | `48.57% <48.57%> (ø)` | |
   | [sdks/python/apache\_beam/pipeline.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcGlwZWxpbmUucHk=) | `91.86% <50.00%> (-0.27%)` | :arrow_down: |
   | [sdks/python/apache\_beam/yaml/yaml\_transform.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC95YW1sX3RyYW5zZm9ybS5weQ==) | `88.06% <50.00%> (-0.18%)` | :arrow_down: |
   | [sdks/python/apache\_beam/yaml/yaml\_provider.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC95YW1sX3Byb3ZpZGVyLnB5) | `70.28% <68.57%> (+0.32%)` | :arrow_up: |
   | [sdks/python/apache\_beam/transforms/external.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9leHRlcm5hbC5weQ==) | `74.36% <70.73%> (-0.23%)` | :arrow_down: |
   | [...ks/python/apache\_beam/typehints/schema\_registry.py](https://app.codecov.io/gh/apache/beam/pull/28169?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3NjaGVtYV9yZWdpc3RyeS5weQ==) | `96.15% <100.00%> (+0.91%)` | :arrow_up: |
   
   ... and [11 files with indirect coverage changes](https://app.codecov.io/gh/apache/beam/pull/28169/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #28169:
URL: https://github.com/apache/beam/pull/28169#issuecomment-1712003882

   Python failures are timeout flakes seen elsewhere, Java failures also unrelated (only java change is BQ write provider, which is unused in WordCount). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on PR #28169:
URL: https://github.com/apache/beam/pull/28169#issuecomment-1711948238

    Java_Examples_Dataflow_Java17 PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #28169: [YAML] Implement a consistent BigQuery read and write transform.

Posted by "robertwb (via GitHub)" <gi...@apache.org>.
robertwb commented on code in PR #28169:
URL: https://github.com/apache/beam/pull/28169#discussion_r1319211417


##########
sdks/python/apache_beam/yaml/standard_io.yaml:
##########
@@ -0,0 +1,53 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This file enumerates the various IOs that are available by default as
+# top-level transforms in Beam's YAML.
+#
+# Note that there may be redundant implementations. In these cases the specs
+# should be kept in sync.
+# TODO(yaml): See if this can be enforced programmatically.
+
+- type: renaming
+  transforms:
+    'ReadFromBigQuery': 'ReadFromBigQuery'
+    'WriteToBigQuery': 'WriteToBigQuery'
+  config:
+    mappings:
+      'ReadFromBigQuery':
+        query: 'query'
+        table: 'tableSpec'
+        fields: 'selectedFields'
+        row_restriction: 'rowRestriction'
+      'WriteToBigQuery':
+        table: 'table'
+        create_disposition: 'createDisposition'
+        write_disposition: 'writeDisposition'
+        error_handling: 'errorHandling'
+    underlying_provider:
+      type: beamJar
+      transforms:
+        'ReadFromBigQuery': 'beam:schematransform:org.apache.beam:bigquery_storage_read:v1'
+        'WriteToBigQuery': 'beam:schematransform:org.apache.beam:bigquery_storage_write:v1'

Review Comment:
   Oh, yes. (I changed this at the last minute just to be safe...)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org