You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:46:00 UTC

[GitHub] [beam] damccorm opened a new issue, #21171: Add support for inferring Beam Schemas from Python protobuf types

damccorm opened a new issue, #21171:
URL: https://github.com/apache/beam/issues/21171

   Just as we can infer a Beam Schema from a NamedTuple type ([code](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py)), we should have support for inferring a schema from a [protobuf-generated Python type](https://developers.google.com/protocol-buffers/docs/pythontutorial).
   
   This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like [SqlTransform](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform), [Select](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.core.html#apache_beam.transforms.core.Select), or [beam.dataframe.convert.to_dataframe](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe) on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the [tutorial](https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message)):
   
   ```
   
   import adressbook_pb2
   
   import apache_beam as beam
   from apache_beam.dataframe.convert import to_dataframe
   
   pc
   = (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person))
   
   df = to_dataframe(pc)
   # deferred dataframe with fields id, name, email, ...
   
   # OR
   
   pc | beam.transforms.SqlTransform("SELECT
   name WHERE email = 'foo@bar.com' FROM PCOLLECTION")
   
   ```
   
   
   Imported from Jira [BEAM-12955](https://issues.apache.org/jira/browse/BEAM-12955). Original Jira may contain additional context.
   Reported by: bhulette.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org