You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/10/25 17:25:01 UTC

[jira] [Commented] (BEAM-12955) Add support for inferring Beam Schemas from Python protobuf types

    [ https://issues.apache.org/jira/browse/BEAM-12955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433888#comment-17433888 ] 

Beam JIRA Bot commented on BEAM-12955:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned.

> Add support for inferring Beam Schemas from Python protobuf types
> -----------------------------------------------------------------
>
>                 Key: BEAM-12955
>                 URL: https://issues.apache.org/jira/browse/BEAM-12955
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Svetak Vihaan Sundhar
>            Priority: P2
>              Labels: stale-assigned
>
> Just as we can infer a Beam Schema from a NamedTuple type ([code|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/schemas.py]), we should have support for inferring a schema from a [protobuf-generated Python type|https://developers.google.com/protocol-buffers/docs/pythontutorial].
> This should integrate well with the rest of the schema infrastructure. For example it should be possible to use schema-aware transforms like [SqlTransform|https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform], [Select|https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.transforms.core.html#apache_beam.transforms.core.Select], or [beam.dataframe.convert.to_dataframe|https://beam.apache.org/releases/pydoc/2.32.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe] on a PCollection that is annotated with a protobuf type. For example (using the addressbook_pb2 example from the [tutorial|https://developers.google.com/protocol-buffers/docs/pythontutorial#reading-a-message]):
> {code:python}
> import adressbook_pb2
> import apache_beam as beam
> from apache_beam.dataframe.convert import to_dataframe
> pc = (input_pc | beam.Map(create_person).with_output_type(addressbook_pb2.Person))
> df = to_dataframe(pc) # deferred dataframe with fields id, name, email, ...
> # OR
> pc | beam.transforms.SqlTransform("SELECT name WHERE email = 'foo@bar.com' FROM PCOLLECTION")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)