You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 14:30:45 UTC

[GitHub] [beam] damccorm opened a new issue, #19938: Add support for mapping additional structured types to Python Schemas

damccorm opened a new issue, #19938:
URL: https://github.com/apache/beam/issues/19938

   Currently we can convert between a `NamedTuple` type and its `Schema` protos using `named_tuple_from_schema` and `named_tuple_to_schema`. I'd like to introduce a system to support additional types, starting with structured types like `attrs`, `dataclasses`, and `TypedDict`.
   
   I've only just started digesting the code, but this task seems pretty straightforward. For example, I think the type-to-schema code would look roughly like this:
   ```
   
   def typing_to_runner_api(type_):
     # type: (Type) -> schema_pb2.FieldType
     structured_handler =
   _get_structured_handler(type_)
     if structured_handler:
       schema = None
       if hasattr(type_, 'id'):
   
        schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id)
       if schema is None:
         fields = structured_handler.get_fields()
   
        type_id = str(uuid4())
         schema = schema_pb2.Schema(fields=fields, id=type_id)
         SCHEMA_REGISTRY.add(type_,
   schema)
   
       return schema_pb2.FieldType(
           row_type=schema_pb2.RowType(
               schema=schema))
   
   
   ```
   
   The rest of the work would be in implementing a class hierarchy for working with structured types, such as getting a list of fields from an instance, and instantiation from a list of fields. Eventually we can extend this behavior to arbitrary, unstructured types.  
   
   Going in the schema-to-type direction, we have the problem of choosing which type to use for a given schema. I believe that as long as `typing_to_runner_api()` has been called on our structured type in the current python session, it should be added to the registry and thus round trip ok, so I think we just need a public function for registering schemas for structured types.
   
   [~bhulette] Did you want to tackle this or are you ok with me going after it?
   
    
   
   Imported from Jira [BEAM-8732](https://issues.apache.org/jira/browse/BEAM-8732). Original Jira may contain additional context.
   Reported by: chadrik.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org