You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Ryan Blue <bl...@cloudera.com> on 2015/10/28 19:14:37 UTC
Python Avro implementations
Hi everyone,
Right now, we have two python implementations: py and py3. And there is
also fastavro [1], which is popular because it is fast and more
pythonic. It also works with python 2.7, python 3.x, pypy, and can be
sped up by cython.
I had a recent e-mail exchange with Miki Tebeka, the creator and
maintainer of fastavro, about the current python Avro implementations
and he's interested in working with the Apache community to merge the
existing implementations into one. I'm really excited about it, since
this is a great opportunity to grow the Avro community and consolidate
the python implementations.
I'd like to start a discussion from this thread about next steps. I
think the best way forward is to bring fastavro in, and then work on
building compatibility with the current APIs where we need to so that we
can deprecate the existing py and py3 projects.
Does that sound reasonable?
rb
[1]: https://github.com/tebeka/fastavro
--
Ryan Blue
Software Engineer
Cloudera, Inc.
Re: Python Avro implementations
Posted by Marius Dieckmann <m....@googlemail.com>.
Hi,
i recently evaluated the performance of various python avro
implementations. Besides the official python implementation and fastavro
there is a fourth implementation called pyavroc [1]. pyavroc seems to be
even faster than fastavro in terms of parsing performance but it uses
the avro C library with python bindings rather than pure python. I am
not sure if this is desired but maybe it could be a good option to
develop fastavro in a way that it is possible to integrate the avro C
into the code in order to improve the performance (in addition i am not
sure if optimizing the code for cython could might improve the
performance to similar level). In addition pyavroc does not seem to have
much API compatibility so i am not sure what should be focus, API
compatibility or performance.
In terms of parsing performance i found the following (normalized
against normal python avro 1.7.7):
avro_python: 1
fastavro: 0.2717 (-> i am not sure if i used cython correctly)
pyavroc: 0.0285 (only functions used that are built-in in
python, means no numpy or sth. similar)
The results were more or less stable with various tests and files.
Cheers
[1] https://github.com/Byhiras/pyavroc
Am 29.10.2015 um 14:23 schrieb Sean Busbey:
> sounds great to me.
>
> On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <bl...@cloudera.com> wrote:
>> Hi everyone,
>>
>> Right now, we have two python implementations: py and py3. And there is also
>> fastavro [1], which is popular because it is fast and more pythonic. It also
>> works with python 2.7, python 3.x, pypy, and can be sped up by cython.
>>
>> I had a recent e-mail exchange with Miki Tebeka, the creator and maintainer
>> of fastavro, about the current python Avro implementations and he's
>> interested in working with the Apache community to merge the existing
>> implementations into one. I'm really excited about it, since this is a great
>> opportunity to grow the Avro community and consolidate the python
>> implementations.
>>
>> I'd like to start a discussion from this thread about next steps. I think
>> the best way forward is to bring fastavro in, and then work on building
>> compatibility with the current APIs where we need to so that we can
>> deprecate the existing py and py3 projects.
>>
>> Does that sound reasonable?
>>
>> rb
>>
>>
>> [1]: https://github.com/tebeka/fastavro
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>
>
Re: Python Avro implementations
Posted by Ryan Blue <bl...@cloudera.com>.
Hey everyone,
Sorry it took so long, I forgot I had promised to open an issue for
this. It is here:
https://issues.apache.org/jira/browse/AVRO-1756
Next step is to get a patch together!
rb
On 11/05/2015 09:06 AM, Ryan Blue wrote:
> Thanks, Miki! This sounds great. I'll open up an issue in Avro's tracker
> for this.
>
> You might also want to have a look at the ongoing import of Matthieu's
> js implementation for an idea about the steps:
>
> https://issues.apache.org/jira/browse/AVRO-1747
>
> Please let us know what we can do to help the process along. If you want
> to put together a patch that adds fastavro as lang/python that would be
> a great start so we can start looking at it. It sounds like another item
> for us to follow up on is the repository structure and release policies
> in the other thread.
>
> rb
>
> On 10/31/2015 01:15 AM, Miki Tebeka wrote:
>> I'd love to have the project hosted under the the official avro
>> repository
>> and gain help from people who know Avro far better than me.
>>
>> I'll take some time to re-learn the existing avro API and try to
>> guestimate
>> the effort involved in wrapping the current fastavro codebase with it.
>> However I have a hunch we won't be 100% backward compatible and will need
>> some phase-out period (of course - I might be wrong :)
>>
>> On Thu, Oct 29, 2015 at 3:23 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> sounds great to me.
>>>
>>> On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <bl...@cloudera.com> wrote:
>>>> Hi everyone,
>>>>
>>>> Right now, we have two python implementations: py and py3. And there is
>>> also
>>>> fastavro [1], which is popular because it is fast and more pythonic. It
>>> also
>>>> works with python 2.7, python 3.x, pypy, and can be sped up by cython.
>>>>
>>>> I had a recent e-mail exchange with Miki Tebeka, the creator and
>>> maintainer
>>>> of fastavro, about the current python Avro implementations and he's
>>>> interested in working with the Apache community to merge the existing
>>>> implementations into one. I'm really excited about it, since this is a
>>> great
>>>> opportunity to grow the Avro community and consolidate the python
>>>> implementations.
>>>>
>>>> I'd like to start a discussion from this thread about next steps. I
>>>> think
>>>> the best way forward is to bring fastavro in, and then work on building
>>>> compatibility with the current APIs where we need to so that we can
>>>> deprecate the existing py and py3 projects.
>>>>
>>>> Does that sound reasonable?
>>>>
>>>> rb
>>>>
>>>>
>>>> [1]: https://github.com/tebeka/fastavro
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Cloudera, Inc.
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>
>
--
Ryan Blue
Software Engineer
Cloudera, Inc.
Re: Python Avro implementations
Posted by Ryan Blue <bl...@cloudera.com>.
Thanks, Miki! This sounds great. I'll open up an issue in Avro's tracker
for this.
You might also want to have a look at the ongoing import of Matthieu's
js implementation for an idea about the steps:
https://issues.apache.org/jira/browse/AVRO-1747
Please let us know what we can do to help the process along. If you want
to put together a patch that adds fastavro as lang/python that would be
a great start so we can start looking at it. It sounds like another item
for us to follow up on is the repository structure and release policies
in the other thread.
rb
On 10/31/2015 01:15 AM, Miki Tebeka wrote:
> I'd love to have the project hosted under the the official avro repository
> and gain help from people who know Avro far better than me.
>
> I'll take some time to re-learn the existing avro API and try to guestimate
> the effort involved in wrapping the current fastavro codebase with it.
> However I have a hunch we won't be 100% backward compatible and will need
> some phase-out period (of course - I might be wrong :)
>
> On Thu, Oct 29, 2015 at 3:23 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> sounds great to me.
>>
>> On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <bl...@cloudera.com> wrote:
>>> Hi everyone,
>>>
>>> Right now, we have two python implementations: py and py3. And there is
>> also
>>> fastavro [1], which is popular because it is fast and more pythonic. It
>> also
>>> works with python 2.7, python 3.x, pypy, and can be sped up by cython.
>>>
>>> I had a recent e-mail exchange with Miki Tebeka, the creator and
>> maintainer
>>> of fastavro, about the current python Avro implementations and he's
>>> interested in working with the Apache community to merge the existing
>>> implementations into one. I'm really excited about it, since this is a
>> great
>>> opportunity to grow the Avro community and consolidate the python
>>> implementations.
>>>
>>> I'd like to start a discussion from this thread about next steps. I think
>>> the best way forward is to bring fastavro in, and then work on building
>>> compatibility with the current APIs where we need to so that we can
>>> deprecate the existing py and py3 projects.
>>>
>>> Does that sound reasonable?
>>>
>>> rb
>>>
>>>
>>> [1]: https://github.com/tebeka/fastavro
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>
>>
>>
>> --
>> Sean
>>
>
--
Ryan Blue
Software Engineer
Cloudera, Inc.
Re: Python Avro implementations
Posted by Miki Tebeka <mi...@gmail.com>.
I'd love to have the project hosted under the the official avro repository
and gain help from people who know Avro far better than me.
I'll take some time to re-learn the existing avro API and try to guestimate
the effort involved in wrapping the current fastavro codebase with it.
However I have a hunch we won't be 100% backward compatible and will need
some phase-out period (of course - I might be wrong :)
On Thu, Oct 29, 2015 at 3:23 PM, Sean Busbey <bu...@cloudera.com> wrote:
> sounds great to me.
>
> On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <bl...@cloudera.com> wrote:
> > Hi everyone,
> >
> > Right now, we have two python implementations: py and py3. And there is
> also
> > fastavro [1], which is popular because it is fast and more pythonic. It
> also
> > works with python 2.7, python 3.x, pypy, and can be sped up by cython.
> >
> > I had a recent e-mail exchange with Miki Tebeka, the creator and
> maintainer
> > of fastavro, about the current python Avro implementations and he's
> > interested in working with the Apache community to merge the existing
> > implementations into one. I'm really excited about it, since this is a
> great
> > opportunity to grow the Avro community and consolidate the python
> > implementations.
> >
> > I'd like to start a discussion from this thread about next steps. I think
> > the best way forward is to bring fastavro in, and then work on building
> > compatibility with the current APIs where we need to so that we can
> > deprecate the existing py and py3 projects.
> >
> > Does that sound reasonable?
> >
> > rb
> >
> >
> > [1]: https://github.com/tebeka/fastavro
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Cloudera, Inc.
>
>
>
> --
> Sean
>
Re: Python Avro implementations
Posted by Sean Busbey <bu...@cloudera.com>.
sounds great to me.
On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <bl...@cloudera.com> wrote:
> Hi everyone,
>
> Right now, we have two python implementations: py and py3. And there is also
> fastavro [1], which is popular because it is fast and more pythonic. It also
> works with python 2.7, python 3.x, pypy, and can be sped up by cython.
>
> I had a recent e-mail exchange with Miki Tebeka, the creator and maintainer
> of fastavro, about the current python Avro implementations and he's
> interested in working with the Apache community to merge the existing
> implementations into one. I'm really excited about it, since this is a great
> opportunity to grow the Avro community and consolidate the python
> implementations.
>
> I'd like to start a discussion from this thread about next steps. I think
> the best way forward is to bring fastavro in, and then work on building
> compatibility with the current APIs where we need to so that we can
> deprecate the existing py and py3 projects.
>
> Does that sound reasonable?
>
> rb
>
>
> [1]: https://github.com/tebeka/fastavro
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
--
Sean