You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Miki Tebeka <mi...@adconion.com> on 2011/05/06 19:34:01 UTC

Avro Python package slowness

Greetings,

I'm using the avro python package (1.5.0), and it is slow.
It takes about 1min to process 33K records file. For comparison the
Java packages process the same file in 1sec.

Any ideas on how to speed that up?

All the best,
--
Miki

Re: Avro Python package slowness

Posted by Doug Cutting <cu...@apache.org>.
On 05/06/2011 04:20 PM, Miki Tebeka wrote:
>>  It should be possible to determine the union
>> branch to write much more efficiently.
> Can you elaborate on how? I'll try to code this and patch.
> Also, I'm talking about reading the avro file, not writing to it.

The optimization I was speaking of is for writing, not reading.

Doug

Re: Avro Python package slowness

Posted by Miki Tebeka <mi...@adconion.com>.
>> I'm using the avro python package (1.5.0), and it is slow.
> Does the schema have unions?  Last I checked, python recursively
> validates data in order to determine which branch of a union should be
> written.  In the worst case (nested unions) this can lead to quadratic
> serialization times.
There are many unions, but not nested ones.

>  It should be possible to determine the union
> branch to write much more efficiently.
Can you elaborate on how? I'll try to code this and patch.
Also, I'm talking about reading the avro file, not writing to it.

All the best,
--
Miki
[I don't suffer from insanity, I enjoy every minute of it]

Re: Avro Python package slowness

Posted by Doug Cutting <cu...@apache.org>.
On 05/06/2011 10:34 AM, Miki Tebeka wrote:
> I'm using the avro python package (1.5.0), and it is slow.
> It takes about 1min to process 33K records file. For comparison the
> Java packages process the same file in 1sec.
> 
> Any ideas on how to speed that up?

Does the schema have unions?  Last I checked, python recursively
validates data in order to determine which branch of a union should be
written.  In the worst case (nested unions) this can lead to quadratic
serialization times.  It should be possible to determine the union
branch to write much more efficiently.

It would be great to have some performance benchmarks for Python, as we
do for Java.

Doug

Re: Avro Python package slowness

Posted by Miki Tebeka <mi...@adconion.com>.
Greetings,

>> BTW: When is 1.5.1 coming out?
> It's out today!
Great, thanks!

All the best,
--
Miki
[I don't suffer from insanity, I enjoy every minute of it]

Re: Avro Python package slowness

Posted by Doug Cutting <cu...@apache.org>.
On 05/06/2011 12:31 PM, Miki Tebeka wrote:
> BTW: When is 1.5.1 coming out?

It's out today!

Doug

Re: Avro Python package slowness

Posted by Miki Tebeka <mi...@adconion.com>.
Greetings,

>> BTW: It'll be nice to have a __version__ in avro/__init__.py
> Please file an issue in Jira and submit a patch, if you are able.
Done - https://issues.apache.org/jira/browse/AVRO-817

BTW: When is 1.5.1 coming out?

All the best,
--
Miki
[I don't suffer from insanity, I enjoy every minute of it]

Re: Avro Python package slowness

Posted by Doug Cutting <cu...@apache.org>.
On 05/06/2011 11:18 AM, Miki Tebeka wrote:
> BTW: It'll be nice to have a __version__ in avro/__init__.py

Please file an issue in Jira and submit a patch, if you are able.

https://issues.apache.org/jira/browse/AVRO

Thanks,

Doug

Re: Avro Python package slowness

Posted by Miki Tebeka <mi...@adconion.com>.
Greetings,

>>I'm using the avro python package (1.5.0), and it is slow.
> Can you try the 1.5.1 release candidate?
> http://people.apache.org/~cutting/avro-1.5.1-rc0/
This trimmed it down to 30sec, nice!

BTW: It'll be nice to have a __version__ in avro/__init__.py

All the best,
--
Miki
[I don't suffer from insanity, I enjoy every minute of it]

Re: Avro Python package slowness

Posted by Scott Carey <sc...@richrelevance.com>.
Can you try the 1.5.1 release candidate?

http://people.apache.org/~cutting/avro-1.5.1-rc0/


It should be faster than 1.5.0, but its very unlikely to match Java.

On 5/6/11 10:34 AM, "Miki Tebeka" <mi...@adconion.com> wrote:

>Greetings,
>
>I'm using the avro python package (1.5.0), and it is slow.
>It takes about 1min to process 33K records file. For comparison the
>Java packages process the same file in 1sec.
>
>Any ideas on how to speed that up?
>
>All the best,
>--
>Miki