You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2021/09/24 18:52:00 UTC

[jira] [Created] (BEAM-12955) Add support for inferring Beam Schemas from Python protobuf types

Brian Hulette created BEAM-12955:
------------------------------------

             Summary: Add support for inferring Beam Schemas from Python protobuf types
                 Key: BEAM-12955
                 URL: https://issues.apache.org/jira/browse/BEAM-12955
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py-core
            Reporter: Brian Hulette


Just as we can infer a Beam Schema from a NamedTuple type ([code|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py]), we should have support for inferring a schema from a [protobuf-generated Python type|https://developers.google.com/protocol-buffers/docs/pythontutorial].

This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like SqlTransform, Select, or beam.dataframe.convert.to_dataframe on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the [tutorial|https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message]):

{code:python}
import adressbook_pb2

import apache_beam as beam
from apache_beam.dataframe.convert import to_dataframe

pc = (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person))

df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ...

# OR

pc | beam.transforms.SqlTransform("SELECT name WHERE email = 'foo@bar.com' FROM PCOLLECTION")
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)