You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by Doug Cutting <cu...@apache.org> on 2010/04/22 17:51:12 UTC

[Fwd: New benchmarking page.]

Avro seems to be sliding a bit in this benchmark.  The poor "create" 
time has always been a problem for Avro, although I'm not sure why. 
This isn't a great benchmark, but lots of folks look at it, so it'd be 
nice if we did well there.

Doug

-------- Original Message --------
Subject: New benchmarking page.
Date: Thu, 22 Apr 2010 04:34:04 -0700
From: Kannan Goundan <ka...@cakoose.com>
Reply-To: java-serialization-benchmarking@googlegroups.com
To: java-serialization-benchmarking@googlegroups.com

I've created a "version 2" of the Benchmarking page.

    http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2

These measurements were generated using the new code I've been adding
over the past month or so.  One advantage of the new code is that I've
actually tried to make the various serializers do the same amount of
work (previously, many serializers were specialized to the exact data
value being tested).

A couple notes:

1. The timing measurements aren't very precise.  I tried taking the
best of 100 trials (instead of the default best of 20) and the numbers
still won't stabilize.  I sometimes get a 20% difference between runs.
  A side-effect of this is we sometimes end up with weird results like
the "deserialize" time being greater than the "deserialize and access
fields" time.

2. The Scala object creation times are higher than before.  I think
this may be because I rewrote the Scala code and used "Option[T]" in a
couple places that were previously just "T".  Most of the Java-based
tools just use "null" for optional values, which is more efficient.

3. The "java (externalizable)" test didn't look like it was using the
Externalizable feature of Java at all.  I renamed it "java-manual"
since all it does is manually serialize using a DataInputStream and
DataOutputStream (which is basically what Kryo does with runtime code
generation).

4. Do you think it's useful to have the XML/JSON tests that use
abbreviated field names?  You'd think someone would use XML/JSON over
a binary format for readability; using abbreviated field names negate
that advantage (partially).

5. I use a slightly different data value (intended to be a bit more
realistic).  See the wiki page: DataStructuresV2.

-- Kannan

-- 
You received this message because you are subscribed to the Google 
Groups "java-serialization-benchmarking" group.
To post to this group, send email to 
java-serialization-benchmarking@googlegroups.com.
To unsubscribe from this group, send email to 
java-serialization-benchmarking+unsubscribe@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/java-serialization-benchmarking?hl=en.

Re: [Fwd: New benchmarking page.]

Posted by Scott Carey <sc...@richrelevance.com>.

The create time can be improved.  I think the issue is it forces Avro to create a lot more objects.

Others can take a string, and then directly encode it to the resulting byte output.   We have to take a string, encode it into a byte[] (in a Utf8) then copy that to the output, and then throw away the Utf8.   We could recycle the byte[] buffers from Utf8's  (say, a thread-local byte[] buffer cache like what Jackson does), or allow Strings to write and read directly from the decoder along side Utf8's.  Our challenge will be that we must encode the length of the string before encoding, and that is not available until it has been converted to Utf8.

Because of the way the test is partitioned, some of our serialize time ended up in the create time -- others do the UTF16 >> UTF8 conversion while serializing, we do it in the 'create' phase.

Furthermore on the Java side I think there is a lot of room for further improvement on the raw serialization and deserialization, but not much of it is easy and most of it has to do with more complicated schemas. 

The benchmark setup is suspect -- last I checked it used an inappropriate heap size and the code comments around its 'warmup' process were misguided.

-Scott

On Apr 22, 2010, at 8:51 AM, Doug Cutting wrote:

> Avro seems to be sliding a bit in this benchmark.  The poor "create" 
> time has always been a problem for Avro, although I'm not sure why. 
> This isn't a great benchmark, but lots of folks look at it, so it'd be 
> nice if we did well there.
> 
> Doug
> 
> -------- Original Message --------
> Subject: New benchmarking page.
> Date: Thu, 22 Apr 2010 04:34:04 -0700
> From: Kannan Goundan <ka...@cakoose.com>
> Reply-To: java-serialization-benchmarking@googlegroups.com
> To: java-serialization-benchmarking@googlegroups.com
> 
> I've created a "version 2" of the Benchmarking page.
> 
>    http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
> 
> These measurements were generated using the new code I've been adding
> over the past month or so.  One advantage of the new code is that I've
> actually tried to make the various serializers do the same amount of
> work (previously, many serializers were specialized to the exact data
> value being tested).