You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2010/08/20 19:14:15 UTC

[jira] Created: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly

Python implementation doesn't stringify sub-schemas correctly
-------------------------------------------------------------

                 Key: AVRO-620
                 URL: https://issues.apache.org/jira/browse/AVRO-620
             Project: Avro
          Issue Type: Bug
          Components: python
            Reporter: Philip Zeyliger


{noformat}

In [9]: import avro.schema

In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y", "type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')

In [11]: str(s.fields[0].type)
Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
{noformat}

str(schema) is used in avro data files to record the schema.  In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X.

I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger resolved AVRO-620.
----------------------------------

     Hadoop Flags: [Reviewed]
         Assignee: Philip Zeyliger
    Fix Version/s: 1.4.0
       Resolution: Fixed

I committed this.  Jeff reviewed it:

{noformat}
Ship it!


Okay.

- Jeff
{noformat}

> Python implementation doesn't stringify sub-schemas correctly
> -------------------------------------------------------------
>
>                 Key: AVRO-620
>                 URL: https://issues.apache.org/jira/browse/AVRO-620
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>             Fix For: 1.4.0
>
>         Attachments: AVRO-620.patch.txt
>
>
> {noformat}
> In [9]: import avro.schema
> In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y", "type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')
> In [11]: str(s.fields[0].type)
> Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
> {noformat}
> str(schema) is used in avro data files to record the schema.  In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X.
> I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated AVRO-620:
---------------------------------

    Attachment: AVRO-620.patch.txt

I believe I've fixed this.  I implemented a Schema.to_json(names) method, which recursively serializes schema objects to JSON-compatible structures, avoiding re-serializing schemas which we've already seen.  (This also means avoiding serializing JSON just to deserialize it again.)  I was able to get rid of some variables which tracked how the schema was originally defined, because this recursion is taking care of noticing that.

As I needed to, I removed some verbosity from the tests and removed some exception handling.  It's very unhelpful when python tests catch exceptions, because they make it that much harder to track down the exact point of the failure.  (An exception that propagates through a test is a test failure, so there's no need to separately mark the test as failed.)  Printing extra information about what tests are running distracts from where the failures are occurring.  I recommend the nose test runner (with flags --pdb --pdb-failure) for running the tests.

I've added a test that triggered this in the first place.

> Python implementation doesn't stringify sub-schemas correctly
> -------------------------------------------------------------
>
>                 Key: AVRO-620
>                 URL: https://issues.apache.org/jira/browse/AVRO-620
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>            Reporter: Philip Zeyliger
>         Attachments: AVRO-620.patch.txt
>
>
> {noformat}
> In [9]: import avro.schema
> In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y", "type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')
> In [11]: str(s.fields[0].type)
> Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
> {noformat}
> str(schema) is used in avro data files to record the schema.  In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X.
> I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901270#action_12901270 ] 

Philip Zeyliger commented on AVRO-620:
--------------------------------------

On reviewboard at https://review.cloudera.org/r/706/

> Python implementation doesn't stringify sub-schemas correctly
> -------------------------------------------------------------
>
>                 Key: AVRO-620
>                 URL: https://issues.apache.org/jira/browse/AVRO-620
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>            Reporter: Philip Zeyliger
>         Attachments: AVRO-620.patch.txt
>
>
> {noformat}
> In [9]: import avro.schema
> In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y", "type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')
> In [11]: str(s.fields[0].type)
> Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
> {noformat}
> str(schema) is used in avro data files to record the schema.  In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X.
> I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-620) Python implementation doesn't stringify sub-schemas correctly

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900943#action_12900943 ] 

Philip Zeyliger commented on AVRO-620:
--------------------------------------

I'm working on this.

> Python implementation doesn't stringify sub-schemas correctly
> -------------------------------------------------------------
>
>                 Key: AVRO-620
>                 URL: https://issues.apache.org/jira/browse/AVRO-620
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>            Reporter: Philip Zeyliger
>
> {noformat}
> In [9]: import avro.schema
> In [10]: s = avro.schema.parse('{"type": "record", "name": "X", "fields": [{"name": "y", "type": {"type": "record", "name": "Y", "fields": [{"name": "Z", "type": "X"}]}}]}')
> In [11]: str(s.fields[0].type)
> Out[11]: '{"fields": [{"type": "X", "name": "Z"}], "type": "record", "name": "Y"}'
> {noformat}
> str(schema) is used in avro data files to record the schema.  In the case above, when we serialize the schema for Y, we should actually also serialize the schema for X, since Y needs the schema for X.
> I ran smack into this when using a schema from a protocol to write a data file, and finding that a lot of the types weren't defined when looking at the avro data file generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.