You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:46:00 UTC
[GitHub] [beam] damccorm opened a new issue, #21171: Add support for inferring Beam Schemas from Python protobuf types
damccorm opened a new issue, #21171:
URL: https://github.com/apache/beam/issues/21171
Just as we can infer a Beam Schema from a NamedTuple type ([code](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py)), we should have support for inferring a schema from a [protobuf-generated Python type](https://developers.google.com/protocol-buffers/docs/pythontutorial).
This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like [SqlTransform](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform), [Select](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.core.html#apache_beam.transforms.core.Select), or [beam.dataframe.convert.to_dataframe](https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe) on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the [tutorial](https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message)):
```
import adressbook_pb2
import apache_beam as beam
from apache_beam.dataframe.convert import to_dataframe
pc
= (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person))
df = to_dataframe(pc)
# deferred dataframe with fields id, name, email, ...
# OR
pc | beam.transforms.SqlTransform("SELECT
name WHERE email = 'foo@bar.com' FROM PCOLLECTION")
```
Imported from Jira [BEAM-12955](https://issues.apache.org/jira/browse/BEAM-12955). Original Jira may contain additional context.
Reported by: bhulette.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org