You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Chong Wang (JIRA)" <ji...@apache.org> on 2018/11/01 16:34:00 UTC

[jira] [Commented] (AVRO-1795) Python2: Cannot parse nested schemas

    [ https://issues.apache.org/jira/browse/AVRO-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671833#comment-16671833 ] 

Chong Wang commented on AVRO-1795:
----------------------------------

Found an example in http://gisgeek.blogspot.com/2012/12/using-apache-avro-with-python.html seems did what is required.

> Python2: Cannot parse nested schemas
> ------------------------------------
>
>                 Key: AVRO-1795
>                 URL: https://issues.apache.org/jira/browse/AVRO-1795
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.8.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>            Priority: Major
>
> In the Java client, one can parse nested schemas by loading the nested schema before the nesting schema.  
> For example, a header can be defined in one file:
> {code:javascript}{ "namespace": "python.avro",
>       "type": "record",
>       "name": "header",
>       "fields": [
>          { "name": "header_field", "type": "string" }
>        ]
>     }{code}
> and then included in another schema:
> {code:javascript}{ "namespace": "python.avro",
>       "type": "record",
>       "name": "event",
>       "fields": [
>          {  "name": "header", "type": "python.avro.header" },
>          {  "name": "event_field", "type": "string" }
>       ]
>     }{code}
> As long as one instantiates the Parser and loads the header first, the schemas will be reconciled and merged correctly.
> However, the Python client does not support this.  The {{parse}} method of the {{schema.py}} file always instantiates a new Names object to hold the schemas:
> {code}def parse(json_string):
>   """Constructs the Schema from the JSON text."""
>   # TODO(hammer): preserve stack trace from JSON parse
>   # parse the JSON
>   try:
>     json_data = json.loads(json_string)
>   except:
>     raise SchemaParseException('Error parsing JSON: %s' % json_string)
>   # Initialize the names object
>   names = Names()
>   # construct the Avro Schema object
>   return make_avsc_object(json_data, names){code}
> Some possible fixes for this are:
> 1) Create a separate Parser class to mimic the Schema.Parser Java approach, while deprecating the current parse method. 
> 2) Include Names as a global variable to the parse method, allowing multiple parse calls to populate the same namespace.  This breaks current behavior (and at least one unit test depends on it), so would be backwards compatible.
> 3) Create a new parse method that returns not only the schema, but also the Names instance and accepts that instance.  This keeps the code nice and functional while exposing the Names class, which previously had been not particularly public.
> I like the first approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)