You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2016/02/04 23:15:39 UTC

[jira] [Created] (AVRO-1795) Python2: Cannot parse nested schemas

Jakob Homan created AVRO-1795:
---------------------------------

             Summary: Python2: Cannot parse nested schemas
                 Key: AVRO-1795
                 URL: https://issues.apache.org/jira/browse/AVRO-1795
             Project: Avro
          Issue Type: Bug
          Components: python
    Affects Versions: 1.8.0
            Reporter: Jakob Homan
            Assignee: Jakob Homan


In the Java client, one can parse nested schemas by loading the nested schema before the nesting schema.  

For example, a header can be defined in one file:
{code:javascript}{ "namespace": "python.avro",
      "type": "record",
      "name": "header",
      "fields": [
         { "name": "header_field", "type": "string" }
       ]
    }{code}
and then included in another schema:
{code:javascript}{ "namespace": "python.avro",
      "type": "record",
      "name": "event",
      "fields": [
         {  "name": "header", "type": "python.avro.header" },
         {  "name": "event_field", "type": "string" }
      ]
    }{code}
As long as one instantiates the Parser and loads the header first, the schemas will be reconciled and merged correctly.

However, the Python client does not support this.  The {{parse}} method of the {{schema.py}} file always instantiates a new Names object to hold the schemas:
{code}def parse(json_string):
  """Constructs the Schema from the JSON text."""
  # TODO(hammer): preserve stack trace from JSON parse
  # parse the JSON
  try:
    json_data = json.loads(json_string)
  except:
    raise SchemaParseException('Error parsing JSON: %s' % json_string)

  # Initialize the names object
  names = Names()

  # construct the Avro Schema object
  return make_avsc_object(json_data, names){code}

Some possible fixes for this are:
1) Create a separate Parser class to mimic the Schema.Parser Java approach, while deprecating the current parse method. 
2) Include Names as a global variable to the parse method, allowing multiple parse calls to populate the same namespace.  This breaks current behavior (and at least one unit test depends on it), so would be backwards compatible.
3) Create a new parse method that returns not only the schema, but also the Names instance and accepts that instance.  This keeps the code nice and functional while exposing the Names class, which previously had been not particularly public.

I like the first approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)