You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/02/18 19:38:38 UTC

[GitHub] [beam] dpcollins-google opened a new pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

dpcollins-google opened a new pull request #14016:
URL: https://github.com/apache/beam/pull/14016


   This modifies the Kafka sql table provider to have a nested mode like Pub/Sub currently does, that exposes the headers and event timestamp from the record. It also allows the payload field in this mode to be BYTES type, remaining unparsed in this case.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam
 .apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.a
 pache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam
 .apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website | Whitespace | Typescript
   --- | --- | --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) <br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/be
 am_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | --- | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu edited a comment on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
robinyqiu edited a comment on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-784600198


   Sure. I will take a look.
   
   cc: @TheNeuralBit as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia merged pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
amaliujia merged pull request #14016:
URL: https://github.com/apache/beam/pull/14016


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
robinyqiu commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-784600198


   cc: @TheNeuralBit as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-785397831


   It was my intention to ultimately remove the nested mode from the pubsub table provider, once we had other options for accessing the message publish time and attributes.
   
   See https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76%40%3Cdev.beam.apache.org%3E, and specifically https://lists.apache.org/thread.html/rb83f154846a3a38eae30b667cee52d6b8f0e94769e71b2b7516cb2ce%40%3Cdev.beam.apache.org%3E


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
robinyqiu commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-790987386


   Thanks Brian for sharing this info. It makes sense to me to keep both the nested and flattened modes for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] dpcollins-google commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
dpcollins-google commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-797061372


   Yes, sorry I had Identified the issue here https://github.com/apache/beam/pull/13980/files#diff-5d769184e0be9fad0074aaf1a90eb5eb059148261b1d82161fc11dea74ada4be but forgot to upstream it. Will do so now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-785401490


   That being said I think there's value in this sort of "raw" mode. In an ideal world, the flattened structure would be a VIEW on the nested/raw version.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] suztomo commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
suztomo commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-797061958


   Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-786120186


   Daniel and I discussed this offline. Some notes:
   - It makes sense to just keep both the nested and flattened modes for now. We can easily detect which one the user is trying to use. In the future the "flattened" option should be re-architected to be a view on the nested mode.
   - However, we should be aware that with the nested mode users will run up against issues with nested rows until we upgrade calcite (BEAM-9378)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] suztomo commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
suztomo commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-797059126


   @dpcollins-google @robinyqiu "SQL Post Commit Tests" has been failing since 2 days ago. I found this PR from the https://ci-beam.apache.org/job/beam_PostCommit_SQL/6022/ . Do you think the test failures are related to this PR?
   
   ```
   13:14:54 > Task :sdks:java:extensions:sql:integrationTest
   13:15:15 
   13:15:15 org.apache.beam.sdk.extensions.sql.meta.provider.kafka.KafkaTableProviderIT > testFakeNested[0] FAILED
   13:15:15     org.apache.beam.sdk.extensions.sql.impl.ParseException at KafkaTableProviderIT.java:227
   13:15:15         Caused by: java.lang.RuntimeException at SqlParseException.java:171
   13:15:23 
   13:15:23 org.apache.beam.sdk.extensions.sql.meta.provider.kafka.KafkaTableProviderIT > testFakeNested[1] FAILED
   13:15:23     org.apache.beam.sdk.extensions.sql.impl.ParseException at KafkaTableProviderIT.java:227
   13:15:23         Caused by: java.lang.RuntimeException at SqlParseException.java:171
   13:15:32 
   13:15:32 org.apache.beam.sdk.extensions.sql.meta.provider.kafka.KafkaTableProviderIT > testFakeNested[2] FAILED
   13:15:32     org.apache.beam.sdk.extensions.sql.impl.ParseException at KafkaTableProviderIT.java:227
   13:15:32         Caused by: java.lang.RuntimeException at SqlParseException.java:171
   13:15:40 
   13:15:40 org.apache.beam.sdk.extensions.sql.meta.provider.kafka.KafkaTableProviderIT > testFakeNested[3] FAILED
   13:15:40     org.apache.beam.sdk.extensions.sql.impl.ParseException at KafkaTableProviderIT.java:227
   13:15:40         Caused by: java.lang.RuntimeException at SqlParseException.java:171
   13:15:51 
   13:15:51 org.apache.beam.sdk.extensions.sql.meta.provider.kafka.KafkaTableProviderIT > testFakeNested[4] FAILED
   13:15:51     org.apache.beam.sdk.extensions.sql.impl.ParseException at KafkaTableProviderIT.java:227
   13:15:51         Caused by: java.lang.RuntimeException at SqlParseException.java:171
   13:21:21 
   13:21:21 55 tests completed, 5 failed
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] dpcollins-google commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
dpcollins-google commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-781587516


   R: @amaliujia 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] amaliujia commented on pull request #14016: [BEAM-11659] Allow Kafka sql table provider to have a nested mode and raw binary payloads

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #14016:
URL: https://github.com/apache/beam/pull/14016#issuecomment-783081014


   R: @robinyqiu this is a very important change so also requesting review from Robin.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org