You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2009/08/14 21:46:14 UTC

[jira] Created: (AVRO-91) add json codec in python

add json codec in python
------------------------

                 Key: AVRO-91
                 URL: https://issues.apache.org/jira/browse/AVRO-91
             Project: Avro
          Issue Type: New Feature
          Components: python
            Reporter: Doug Cutting


Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-91) add json codec in python

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749380#action_12749380 ] 

Sharad Agarwal commented on AVRO-91:
------------------------------------

bq. In particular note the addition of {read,write}{Record,Union}{Start,End}, etc. methods in Encoder.java and Decoder.java.
I think it is a reasonable trade-off as it avoids the parser complexity.

> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-91) add json codec in python

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749545#action_12749545 ] 

Thiruvalluvan M. G. commented on AVRO-91:
-----------------------------------------

One drawback of having {read,write}{Record,Union}{Start,End} methods is that all clients that use decoder/encoder will have to generate these calls. This could be cumbersome for the clients and/or have performance impact.

Here is an approach which is not as complicated as the Java implementation of the parser. This parser is not as efficient as the one implemented in Java. But I guess performance is not vital for Json encoder/decoder as their main purpose is for diagnostics and debugging.

Here I describe the Encoder, but the idea can be implemented for the decoder as well.

The JSON encoder has a stack of "Markers". Markers are of these types - SCHEMA, RECORD_START, RECORD_END, FIELD, ARRAY_START, ARRAYEND, MAP_START, MAP_END, REPEATER etc. The SCHEMA marker will have a schema object associated with it. REPEATER marker has one or two schema objects associated with it. The FIELD marker has the field-name and the field-number associated with it.

The method writeBoolean() will call advance(schema.BOOLEAN) before writing "true" or "false" into the underlying stream. Similarly writeInt() will call advance(schema.INT) before writing the decimal string corresponding to the int into the underlying stream. Other write() methods for primitive types call advance() with an appropriate schema type.

The advance() method looks at the top of the stack, if the top of the stack is a SCHEMA marker and the schema matches the type passed to the advance(), then it simply pops the top element in the stack and returns. If the top of the stack is a SCHEMA marker, but the schema type is a compound type (such as a record, map or array) then it "expands" the top element (see below). If the top element is a SCHEMA marker, and the schema is non-compound type and it does not match the argument type of advance(), it is an error. If the top element is not a SCHEMA marker, it inserts appropriate text into the output stream. For example, if it is a RECORD_START or MAP_START a open-brace is written. Similarly, it it is a ARRAY_START a open square-bracket is written. If it is a FIELD marker, the field name associated with that field is written followed by a colon.

The expand() operation pops the top of the stack and replaces with the expansion of that marker. Only SCHEMA markers with compound schema types or REPEATER markers get expanded. The RECORD SCHEMA marker gets expanded to a  sequence [RECORD_START, <FIELD, SCHEMA>*, RECORD_END]. The number of FIELD, SCHEMA pairs is the same as the number of fields of the record. The expanded sequence is pushed in the reverse order; that is RECORD_START will be at the top of the stack after expansion. Array SCHEMA marker gets expanded to {ARRAY_START, REPEATER, ARRAY_END }. The REPEATER has the schema of the element-type of the array. Map SCHEMA marker gets expanded to {MAP_START, REPEATER, MAP_END}; the REPEATER will have a string and a schema for the value of the map.

Expanding a union is somewhat different. It replaces the union SCHEMA marker with a SCHEMA marker for the appropriate branch. REPEATER marker is expanded to { SCHEMA, REPEATER } or { SCHEMA, SCHEMA, REPEATER} where the SCHEMAs are the contents of the REPEATER. On reaching the end of array/map, the REPEATER marker at the top of the stack get discarded.

The above should take care of all aspects of Json encoding except the commas that should appear between fields in a record, or elements in array/map. The field number field of FIELD marker can be used to decide if a comma needs to be inserted. Some additional information can be kept in REPEATER to decide if a comma is needed in arrays/maps.

> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (AVRO-91) add json codec in python

Posted by "Ravi Gummadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravi Gummadi reassigned AVRO-91:
--------------------------------

    Assignee: Ravi Gummadi

> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-91) add json codec in python

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750022#action_12750022 ] 

Doug Cutting commented on AVRO-91:
----------------------------------

> One drawback of having {read,write}{Record,Union}{Start,End} methods is that all clients that use decoder/encoder
> will have to generate these calls. This could be cumbersome for the clients and/or have performance impact.

There are not many clients.  Mostly it's just generic, since specific inherits from that, no?  So this is only a problem if we expect applications to code directly to the Encoder/Decoder API.  The Java parser was implemented with that in mind, so that folks could, e.g., write data in a streaming manner, without ever building objects.  Do we expect folks to do this much in Python?

As for performance, I would not expect two no-op method calls per record and union would impact things much.


> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-91) add json codec in python

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744573#action_12744573 ] 

Doug Cutting commented on AVRO-91:
----------------------------------

One can implement a JSON codec without implementing a parser.  We do not want to force every Avro implementation to implement a parser.

You will need to add some methods to your encoder/decoder API.  Please look at my original patch of 6/26 for AVRO-50, where I implemented a JSON codec w/o a parser.  In particular note the addition of {read,write}{Record,Union}{Start,End}, etc. methods in Encoder.java and Decoder.java.


> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-91) add json codec in python

Posted by "Ravi Gummadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744405#action_12744405 ] 

Ravi Gummadi commented on AVRO-91:
----------------------------------

Planning to incorporate changes similar to AVRO-50(and AVRO-90) in python.

Is there a better/simpler way of doing the same in python ?

> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-91) add json codec in python

Posted by "Thiruvalluvan M. G. (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751080#action_12751080 ] 

Thiruvalluvan M. G. commented on AVRO-91:
-----------------------------------------

Actually, in addition to {read,write}{Record,Union}{Start,End} we need to introduce a method to write field names each field of records, an add additional parameter while writing enums, enum schema while reading enums, union schema while reading/writing the branch. So the additional overhead will be something like one no-op call per field. 

> add json codec in python
> ------------------------
>
>                 Key: AVRO-91
>                 URL: https://issues.apache.org/jira/browse/AVRO-91
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>            Reporter: Doug Cutting
>            Assignee: Ravi Gummadi
>
> Now that AVRO-50 is complete, it would be good to have a Json encoder and decoders in Python.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.