You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@thrift.apache.org by Ted Zlatanov <tz...@lifelogs.com> on 2010/05/10 16:10:36 UTC

poor Perl vs. Java Thrift performance in Cassandra

Apologies if this has been discussed before but I didn't see it in the
archives.

I see poor performance of any Perl code against Cassandra compared to
Java.  I generally clock a 5-20x speed difference using the raw Thrift
API, depending on the number of structures that need to be
serialized/deserialized.  This is with Perl 5.10 vs. the latest Sun JVM.

I maintain the Net::Cassandra::Easy Perl module that uses this interface
so I'd like to make it faster.  I think any performance improvements
would be good for all Thrift users so I am posting here in the hopes of
getting some feedback.

It seems to me like one of the problems is the large number of OO method
calls, which in Perl are slower than function calls.  Another is that
pack()/unpack() is probably the fastest way to serialize/deserialize data
in Perl, but it's not used much.  Instead I see step-by-step
accumulation of values from the source data, which is suboptimal.  In
Java this makes perfect sense but in Perl it drags performance down.

Perhaps a good optimization would be to generate the pack/unpack format
strings at compilation time, combine them with static function wrappers,
and use that instead of multiple OO calls?  Although I am comfortable
with Perl, I don't know Thrift well enough to recommend the best
approach there.  I hope to be helpful with benchmarks and specific
optimizations, though.

Thanks
Ted

Re: poor Perl vs. Java Thrift performance in Cassandra

Posted by Ted Zlatanov <tz...@lifelogs.com>.

On Mon, 10 May 2010 09:23:57 -0700 Bryan Duxbury <br...@rapleaf.com> wrote: 

BD> I can't really speak to the Perl library, but I personally have spent *lots*
BD> of time optimizing the Java libraries. I suspect that with some time put in,
BD> you'll be able to find a lot of room for improvement.

BD> The next step is probably to create a JIRA ticket for Perl performance
BD> enhancements, and then get crackin'. Hopefully those with more Perl
BD> experience will show up to review and comment on any patches you create.

http://issues.apache.org/jira/browse/THRIFT-775

The current maintainer of the Perl support is probably the best person
to lead this effort.  My suggestions are probably lacking the background
necessary to optimize the code properly.  I also can't dedicate
sufficient time to this work to complete it quickly by myself (I don't
know Thrift well from the inside), so it would be very helpful if I
could at least work with someone who can help me initially.

Thanks
Ted

Re: poor Perl vs. Java Thrift performance in Cassandra

Posted by Bryan Duxbury <br...@rapleaf.com>.

Ted -

I can't really speak to the Perl library, but I personally have spent *lots*
of time optimizing the Java libraries. I suspect that with some time put in,
you'll be able to find a lot of room for improvement.

The next step is probably to create a JIRA ticket for Perl performance
enhancements, and then get crackin'. Hopefully those with more Perl
experience will show up to review and comment on any patches you create.

-Bryan

2010/5/10 Ted Zlatanov <tz...@lifelogs.com>

> Apologies if this has been discussed before but I didn't see it in the
> archives.
>
> I see poor performance of any Perl code against Cassandra compared to
> Java.  I generally clock a 5-20x speed difference using the raw Thrift
> API, depending on the number of structures that need to be
> serialized/deserialized.  This is with Perl 5.10 vs. the latest Sun JVM.
>
> I maintain the Net::Cassandra::Easy Perl module that uses this interface
> so I'd like to make it faster.  I think any performance improvements
> would be good for all Thrift users so I am posting here in the hopes of
> getting some feedback.
>
> It seems to me like one of the problems is the large number of OO method
> calls, which in Perl are slower than function calls.  Another is that
> pack()/unpack() is probably the fastest way to serialize/deserialize data
> in Perl, but it's not used much.  Instead I see step-by-step
> accumulation of values from the source data, which is suboptimal.  In
> Java this makes perfect sense but in Perl it drags performance down.
>
> Perhaps a good optimization would be to generate the pack/unpack format
> strings at compilation time, combine them with static function wrappers,
> and use that instead of multiple OO calls?  Although I am comfortable
> with Perl, I don't know Thrift well enough to recommend the best
> approach there.  I hope to be helpful with benchmarks and specific
> optimizations, though.
>
> Thanks
> Ted
>
>