You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2013/08/13 20:02:48 UTC

[jira] [Commented] (AVRO-1345) Python Codegen

    [ https://issues.apache.org/jira/browse/AVRO-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738579#comment-13738579 ] 

Doug Cutting commented on AVRO-1345:
------------------------------------

This looks great.  Tests pass for me.

A few concerns I have before we commit this:
 - Can we please avoid incompatibly changing the command line for SpecificCompilerTool?  Perhaps we should add a -lang option that defaults to 'java'.
 - Shouldn't we add some tests of the generated Python code?  We might just add tests in the python tree that refer to ../../lang/java/tools/src/test/compiler/output/*.py, or we could move these into share/test.
 - We might add something to the Python "getting started" guide about this.
                
> Python Codegen
> --------------
>
>                 Key: AVRO-1345
>                 URL: https://issues.apache.org/jira/browse/AVRO-1345
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, python
>            Reporter: Tal Levy
>         Attachments: AVRO-1345.patch
>
>
> I recently started using Avro at my work and we found it difficult to keep 
> track of what python dict matched to what schema. Instead of having 
> random dicts being populated and then attempted to be serialized to avro, I thought 
> it would be more readable and less error prone to codegen the python dict 
> for developers. These classes are type checked field by field. Although it does not 
> have the advantage of compiled type checking like in the java codegen, it is a 
> friendly wrapper around python dicts representing avro records to be serialized.
> let me know what you think about this, I am still tweaking how it behaves. 
> I understand it is a bit unpythonic to enforce types in this way, but the readability 
> is worth it nonetheless.
> here is an example record:
> https://gist.github.com/talevy/5696236
> I extended the avro compiler/tools to provide both java and python codegen functionality.
> so if this sounds like something others would use, maybe it makes sense to include it
> into the main repo.
> here are the changes
> https://github.com/talevy/avro/tree/python-codegen
> a few caveats and thoughts about my current version:
> 1. I do not know how to best handle constructors, because some fields are not allowed to be null... maybe a builder pattern would work here, but it's kind of weird in python
> 2. I copy/pasted a lot of the code from SpecificCompiler to make the PythonCompiler... some renaming and code re-use via inheritance would make it read better.
> 3. I wanted to reuse the validate methods provided already in Avro to verify the record, but it takes away from some of the class type correctness for nested records and such.
> 4. I do not know what the best way of outputing multiple files is, I currently use the same packaging as the java classes into their namespace directories
> 5. I am not familiar with the avro-protocol format, so I only implemented enums and records.
> I updated the SpecificCompilerTool to have the following usage
> ```
> "Usage: [-string] (schema|protocol) (python|java) input... outputdir"
> ````
> So generating the python classes is as easy as java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira