You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Tal Levy <ju...@gmail.com> on 2013/06/03 07:28:57 UTC

Python-Avro Codegen Proposal

Hi,

I recently started using Avro at my work and we found it difficult to keep
track of what python dict matched to what schema. Instead of having
random dicts being populated and then attempted to be serialized to avro, I
thought
it would be more readable and less error prone to codegen the python dict
for developers. These classes are type checked field by field. Although it
does not
have the advantage of compiled type checking like in the java codegen, it
is a
friendly wrapper around python dicts representing avro records to be
serialized.

let me know what you think about this, I am still tweaking how it behaves.
I understand it is a bit unpythonic to enforce types in this way, but the
readability
is worth it nonetheless.

here is an example record:
https://gist.github.com/talevy/5696236

I extended the avro compiler/tools to provide both java and python codegen
functionality.
so if this sounds like something others would use, maybe it makes sense to
include it
into the main repo.


Thanks,
Tal

Re: Python-Avro Codegen Proposal

Posted by Tal Levy <ju...@gmail.com>.
yes, SpecificCompiler is template driven, but there are a few things that
are specific to the language... like

addStringType, reserved words in the language, how to map avro types to
language specific types.

the changes are pretty minimal, but enough to confuse or overload the class
if made to handle both python and java... in my opinion.

Thanks,
Tal


On Tue, Jun 4, 2013 at 5:16 PM, Philip Zeyliger <ph...@cloudera.com> wrote:

> Thanks!  Please do file the JIRA.
> https://cwiki.apache.org/AVRO/how-to-contribute.html is a guide on
> contributing.
>
> SpecificCompiler is template driven (or was, at some point).  I wonder if
> you could have implemented it by just substituting a different set of
> templates.
>
> Cheers,
>
> -- Philip
>
>
> On Tue, Jun 4, 2013 at 1:54 PM, Tal Levy <ju...@gmail.com> wrote:
>
> > I will try and clean it up and set up a patch to submit via JIRA
> >
> > here are the changes
> >
> > https://github.com/talevy/avro/tree/python-codegen
> >
> > a few caveats and thoughts about my current version:
> >
> > 1. I do not know how to best handle constructors, because some fields are
> > not allowed to be null... maybe a builder pattern would work here, but
> it's
> > kind of weird in python
> > 2. I copy/pasted a lot of the code from SpecificCompiler to make the
> > PythonCompiler... some renaming and code re-use via inheritance would
> make
> > it read better.
> > 3. I wanted to reuse the validate methods provided already in Avro to
> > verify the record, but it takes away from some of the class type
> > correctness for nested records and such.
> > 4. I do not know what the best way of outputing multiple files is, I
> > currently use the same packaging as the java classes into their namespace
> > directories
> > 5. I am not familiar with the avro-protocol format, so I only implemented
> > enums and records.
> >
> > I updated the SpecificCompilerTool to have the following usage
> >
> > ```
> > "Usage: [-string] (schema|protocol) (python|java) input... outputdir"
> > ````
> >
> > So generating the python classes is as easy as java.
> >
> >
> > let me know what you think
> >
> >
> > On Tue, Jun 4, 2013 at 11:24 AM, Philip Zeyliger <philip@cloudera.com
> > >wrote:
> >
> > > Hi Tal,
> > >
> > > I would encourage you to file a JIRA and contribute your changes!
> > >
> > > I agree that the generated code approach is a great one for things like
> > RPC
> > > usage, where the schema changes slowly and the extra checking is super
> > > handy.
> > >
> > > -- Philip
> > >
> > >
> > > On Mon, Jun 3, 2013 at 6:28 PM, Stefan Krawczyk <st...@nextdoor.com>
> > > wrote:
> > >
> > > > Hi Tal,
> > > >
> > > > I'm interested in using Avro + Python. That would sound useful, any
> > > chance
> > > > I could have a play with what you've done?
> > > >
> > > > Cheers,
> > > >
> > > > Stefan
> > > >
> > > >
> > > > On Sun, Jun 2, 2013 at 10:28 PM, Tal Levy <ju...@gmail.com>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I recently started using Avro at my work and we found it difficult
> to
> > > > keep
> > > > > track of what python dict matched to what schema. Instead of having
> > > > > random dicts being populated and then attempted to be serialized to
> > > > avro, I
> > > > > thought
> > > > > it would be more readable and less error prone to codegen the
> python
> > > dict
> > > > > for developers. These classes are type checked field by field.
> > Although
> > > > it
> > > > > does not
> > > > > have the advantage of compiled type checking like in the java
> > codegen,
> > > it
> > > > > is a
> > > > > friendly wrapper around python dicts representing avro records to
> be
> > > > > serialized.
> > > > >
> > > > > let me know what you think about this, I am still tweaking how it
> > > > behaves.
> > > > > I understand it is a bit unpythonic to enforce types in this way,
> but
> > > the
> > > > > readability
> > > > > is worth it nonetheless.
> > > > >
> > > > > here is an example record:
> > > > > https://gist.github.com/talevy/5696236
> > > > >
> > > > > I extended the avro compiler/tools to provide both java and python
> > > > codegen
> > > > > functionality.
> > > > > so if this sounds like something others would use, maybe it makes
> > sense
> > > > to
> > > > > include it
> > > > > into the main repo.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Tal
> > > > >
> > > >
> > >
> >
>

Re: Python-Avro Codegen Proposal

Posted by Philip Zeyliger <ph...@cloudera.com>.
Thanks!  Please do file the JIRA.
https://cwiki.apache.org/AVRO/how-to-contribute.html is a guide on
contributing.

SpecificCompiler is template driven (or was, at some point).  I wonder if
you could have implemented it by just substituting a different set of
templates.

Cheers,

-- Philip


On Tue, Jun 4, 2013 at 1:54 PM, Tal Levy <ju...@gmail.com> wrote:

> I will try and clean it up and set up a patch to submit via JIRA
>
> here are the changes
>
> https://github.com/talevy/avro/tree/python-codegen
>
> a few caveats and thoughts about my current version:
>
> 1. I do not know how to best handle constructors, because some fields are
> not allowed to be null... maybe a builder pattern would work here, but it's
> kind of weird in python
> 2. I copy/pasted a lot of the code from SpecificCompiler to make the
> PythonCompiler... some renaming and code re-use via inheritance would make
> it read better.
> 3. I wanted to reuse the validate methods provided already in Avro to
> verify the record, but it takes away from some of the class type
> correctness for nested records and such.
> 4. I do not know what the best way of outputing multiple files is, I
> currently use the same packaging as the java classes into their namespace
> directories
> 5. I am not familiar with the avro-protocol format, so I only implemented
> enums and records.
>
> I updated the SpecificCompilerTool to have the following usage
>
> ```
> "Usage: [-string] (schema|protocol) (python|java) input... outputdir"
> ````
>
> So generating the python classes is as easy as java.
>
>
> let me know what you think
>
>
> On Tue, Jun 4, 2013 at 11:24 AM, Philip Zeyliger <philip@cloudera.com
> >wrote:
>
> > Hi Tal,
> >
> > I would encourage you to file a JIRA and contribute your changes!
> >
> > I agree that the generated code approach is a great one for things like
> RPC
> > usage, where the schema changes slowly and the extra checking is super
> > handy.
> >
> > -- Philip
> >
> >
> > On Mon, Jun 3, 2013 at 6:28 PM, Stefan Krawczyk <st...@nextdoor.com>
> > wrote:
> >
> > > Hi Tal,
> > >
> > > I'm interested in using Avro + Python. That would sound useful, any
> > chance
> > > I could have a play with what you've done?
> > >
> > > Cheers,
> > >
> > > Stefan
> > >
> > >
> > > On Sun, Jun 2, 2013 at 10:28 PM, Tal Levy <ju...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I recently started using Avro at my work and we found it difficult to
> > > keep
> > > > track of what python dict matched to what schema. Instead of having
> > > > random dicts being populated and then attempted to be serialized to
> > > avro, I
> > > > thought
> > > > it would be more readable and less error prone to codegen the python
> > dict
> > > > for developers. These classes are type checked field by field.
> Although
> > > it
> > > > does not
> > > > have the advantage of compiled type checking like in the java
> codegen,
> > it
> > > > is a
> > > > friendly wrapper around python dicts representing avro records to be
> > > > serialized.
> > > >
> > > > let me know what you think about this, I am still tweaking how it
> > > behaves.
> > > > I understand it is a bit unpythonic to enforce types in this way, but
> > the
> > > > readability
> > > > is worth it nonetheless.
> > > >
> > > > here is an example record:
> > > > https://gist.github.com/talevy/5696236
> > > >
> > > > I extended the avro compiler/tools to provide both java and python
> > > codegen
> > > > functionality.
> > > > so if this sounds like something others would use, maybe it makes
> sense
> > > to
> > > > include it
> > > > into the main repo.
> > > >
> > > >
> > > > Thanks,
> > > > Tal
> > > >
> > >
> >
>

Re: Python-Avro Codegen Proposal

Posted by Tal Levy <ju...@gmail.com>.
I will try and clean it up and set up a patch to submit via JIRA

here are the changes

https://github.com/talevy/avro/tree/python-codegen

a few caveats and thoughts about my current version:

1. I do not know how to best handle constructors, because some fields are
not allowed to be null... maybe a builder pattern would work here, but it's
kind of weird in python
2. I copy/pasted a lot of the code from SpecificCompiler to make the
PythonCompiler... some renaming and code re-use via inheritance would make
it read better.
3. I wanted to reuse the validate methods provided already in Avro to
verify the record, but it takes away from some of the class type
correctness for nested records and such.
4. I do not know what the best way of outputing multiple files is, I
currently use the same packaging as the java classes into their namespace
directories
5. I am not familiar with the avro-protocol format, so I only implemented
enums and records.

I updated the SpecificCompilerTool to have the following usage

```
"Usage: [-string] (schema|protocol) (python|java) input... outputdir"
````

So generating the python classes is as easy as java.


let me know what you think


On Tue, Jun 4, 2013 at 11:24 AM, Philip Zeyliger <ph...@cloudera.com>wrote:

> Hi Tal,
>
> I would encourage you to file a JIRA and contribute your changes!
>
> I agree that the generated code approach is a great one for things like RPC
> usage, where the schema changes slowly and the extra checking is super
> handy.
>
> -- Philip
>
>
> On Mon, Jun 3, 2013 at 6:28 PM, Stefan Krawczyk <st...@nextdoor.com>
> wrote:
>
> > Hi Tal,
> >
> > I'm interested in using Avro + Python. That would sound useful, any
> chance
> > I could have a play with what you've done?
> >
> > Cheers,
> >
> > Stefan
> >
> >
> > On Sun, Jun 2, 2013 at 10:28 PM, Tal Levy <ju...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I recently started using Avro at my work and we found it difficult to
> > keep
> > > track of what python dict matched to what schema. Instead of having
> > > random dicts being populated and then attempted to be serialized to
> > avro, I
> > > thought
> > > it would be more readable and less error prone to codegen the python
> dict
> > > for developers. These classes are type checked field by field. Although
> > it
> > > does not
> > > have the advantage of compiled type checking like in the java codegen,
> it
> > > is a
> > > friendly wrapper around python dicts representing avro records to be
> > > serialized.
> > >
> > > let me know what you think about this, I am still tweaking how it
> > behaves.
> > > I understand it is a bit unpythonic to enforce types in this way, but
> the
> > > readability
> > > is worth it nonetheless.
> > >
> > > here is an example record:
> > > https://gist.github.com/talevy/5696236
> > >
> > > I extended the avro compiler/tools to provide both java and python
> > codegen
> > > functionality.
> > > so if this sounds like something others would use, maybe it makes sense
> > to
> > > include it
> > > into the main repo.
> > >
> > >
> > > Thanks,
> > > Tal
> > >
> >
>

Re: Python-Avro Codegen Proposal

Posted by Philip Zeyliger <ph...@cloudera.com>.
Hi Tal,

I would encourage you to file a JIRA and contribute your changes!

I agree that the generated code approach is a great one for things like RPC
usage, where the schema changes slowly and the extra checking is super
handy.

-- Philip


On Mon, Jun 3, 2013 at 6:28 PM, Stefan Krawczyk <st...@nextdoor.com> wrote:

> Hi Tal,
>
> I'm interested in using Avro + Python. That would sound useful, any chance
> I could have a play with what you've done?
>
> Cheers,
>
> Stefan
>
>
> On Sun, Jun 2, 2013 at 10:28 PM, Tal Levy <ju...@gmail.com> wrote:
>
> > Hi,
> >
> > I recently started using Avro at my work and we found it difficult to
> keep
> > track of what python dict matched to what schema. Instead of having
> > random dicts being populated and then attempted to be serialized to
> avro, I
> > thought
> > it would be more readable and less error prone to codegen the python dict
> > for developers. These classes are type checked field by field. Although
> it
> > does not
> > have the advantage of compiled type checking like in the java codegen, it
> > is a
> > friendly wrapper around python dicts representing avro records to be
> > serialized.
> >
> > let me know what you think about this, I am still tweaking how it
> behaves.
> > I understand it is a bit unpythonic to enforce types in this way, but the
> > readability
> > is worth it nonetheless.
> >
> > here is an example record:
> > https://gist.github.com/talevy/5696236
> >
> > I extended the avro compiler/tools to provide both java and python
> codegen
> > functionality.
> > so if this sounds like something others would use, maybe it makes sense
> to
> > include it
> > into the main repo.
> >
> >
> > Thanks,
> > Tal
> >
>

Re: Python-Avro Codegen Proposal

Posted by Stefan Krawczyk <st...@nextdoor.com>.
Hi Tal,

I'm interested in using Avro + Python. That would sound useful, any chance
I could have a play with what you've done?

Cheers,

Stefan


On Sun, Jun 2, 2013 at 10:28 PM, Tal Levy <ju...@gmail.com> wrote:

> Hi,
>
> I recently started using Avro at my work and we found it difficult to keep
> track of what python dict matched to what schema. Instead of having
> random dicts being populated and then attempted to be serialized to avro, I
> thought
> it would be more readable and less error prone to codegen the python dict
> for developers. These classes are type checked field by field. Although it
> does not
> have the advantage of compiled type checking like in the java codegen, it
> is a
> friendly wrapper around python dicts representing avro records to be
> serialized.
>
> let me know what you think about this, I am still tweaking how it behaves.
> I understand it is a bit unpythonic to enforce types in this way, but the
> readability
> is worth it nonetheless.
>
> here is an example record:
> https://gist.github.com/talevy/5696236
>
> I extended the avro compiler/tools to provide both java and python codegen
> functionality.
> so if this sounds like something others would use, maybe it makes sense to
> include it
> into the main repo.
>
>
> Thanks,
> Tal
>