You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/13 22:05:25 UTC

[GitHub] [beam] TheNeuralBit opened a new pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

TheNeuralBit opened a new pull request #11701:
URL: https://github.com/apache/beam/pull/11701


   - Updates sql_test to require generating user types with an `id` field, which yields a name collision with the `id` added to generated NamedTuple types.
   - Modifies row_coder to extract values from NamedTuple instances as plain tuples, rather than using `getattr`. This fixes the above name collision.
   - Also, store schema id in an attribute called `_beam_schema_id` rather than `id` to avoid future name collisions.
   - Adds logic to set `_beam_schema_id` for provided (non-generated) user types as well as a test of this logic. This allows us to retrieve the same schema for the type in the future, rather than re-generating it.
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-628270255


   R: @robertwb 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631751672


   Python PreCommit failure is due to https://issues.apache.org/jira/browse/BEAM-9975


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #11701:
URL: https://github.com/apache/beam/pull/11701#discussion_r428217917



##########
File path: sdks/python/apache_beam/coders/row_coder.py
##########
@@ -134,19 +134,18 @@ def __init__(self, schema, components):
   def encode_to_stream(self, value, out, nested):
     nvals = len(self.schema.fields)
     self.SIZE_CODER.encode_to_stream(nvals, out, True)
-    attrs = [getattr(value, f.name) for f in self.schema.fields]

Review comment:
       Yeah, let's drop this change for now. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631697606


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631700662


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit edited a comment on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit edited a comment on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631751672


   Python PreCommit failure is a flake due to https://issues.apache.org/jira/browse/BEAM-9975


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #11701:
URL: https://github.com/apache/beam/pull/11701#discussion_r428223966



##########
File path: sdks/python/apache_beam/coders/row_coder.py
##########
@@ -134,19 +134,18 @@ def __init__(self, schema, components):
   def encode_to_stream(self, value, out, nested):
     nvals = len(self.schema.fields)
     self.SIZE_CODER.encode_to_stream(nvals, out, True)
-    attrs = [getattr(value, f.name) for f in self.schema.fields]

Review comment:
       I just pushed a commit that drops the coder change for now




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #11701:
URL: https://github.com/apache/beam/pull/11701#discussion_r427648180



##########
File path: sdks/python/apache_beam/coders/row_coder.py
##########
@@ -134,19 +134,18 @@ def __init__(self, schema, components):
   def encode_to_stream(self, value, out, nested):
     nvals = len(self.schema.fields)
     self.SIZE_CODER.encode_to_stream(nvals, out, True)
-    attrs = [getattr(value, f.name) for f in self.schema.fields]

Review comment:
       This forces the value to be an iterable, rather than just having the right fields, right? Are we sure we want to do that? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631703157


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit merged pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit merged pull request #11701:
URL: https://github.com/apache/beam/pull/11701


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631760558


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631649461


   Run XVR_Spark PostCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631696209


   Run Python2_PVR_Flink PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #11701:
URL: https://github.com/apache/beam/pull/11701#issuecomment-631633755


   Run Python PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #11701: [BEAM-9899] Fix some issues around storing schema `id` on user types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #11701:
URL: https://github.com/apache/beam/pull/11701#discussion_r427655332



##########
File path: sdks/python/apache_beam/coders/row_coder.py
##########
@@ -134,19 +134,18 @@ def __init__(self, schema, components):
   def encode_to_stream(self, value, out, nested):
     nvals = len(self.schema.fields)
     self.SIZE_CODER.encode_to_stream(nvals, out, True)
-    attrs = [getattr(value, f.name) for f in self.schema.fields]

Review comment:
       Yeah that's right. Right now it should only be possible to get here with a NamedTuple instance, so it should be safe.
   
   Looking forward to the day where more types might go through this code.. I kind of like the idea of using `tuple` as the "base" schema type in Python (i.e. the type we must be able to convert to/from for use in row coder). Relying on attributes isn't great since it limits us to field names that are valid python identifiers.
   
   All that being said I'd be fine dropping this part of the change for now. Renaming the special `id` field also fixes the bug.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org