You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Chad Walters <Ch...@microsoft.com> on 2010/01/13 07:54:10 UTC

Perf regression in Java bindings?

I was playing around with upgrading some benchmarks to Thrift 0.2 and I think that THRIFT-623 <https://issues.apache.org/jira/browse/THRIFT-623> may have introduced a performance regression.

Struct deserialization now does a map lookup to go from an integer to an enumeration, checks for null in case the integer was not known, and then does a switch on the value of the enumeration.

But it appears to me that this mapping is entirely fixed at compile time. Furthermore the enumeration does not appear to be used for anything other than performing the switch (in the struct reading routine at least).

This leads me to ask:
1. Why not just switch on the integer and have a default case for unknown values?
2. Why have a dynamic map for a mapping that is known at compile time? Wouldn't it be faster to just provide the mapping via a switch (when it is needed at all)?

Perhaps I am missing some subtlety but I thought I'd ask...

Chad



Re: Perf regression in Java bindings?

Posted by Bryan Duxbury <br...@rapleaf.com>.
Thanks for looking into this. I will investigate what it would take to solve
these performance regressions.

-Bryan

On Wed, Jan 13, 2010 at 9:12 PM, Chad Walters <Ch...@microsoft.com>wrote:

> I gathered the following data using the thrift-protobuf-compare benchmarks
> (multiple runs with each change). It's probably not very representative of
> any real world workload, but it's what I have to work with at the moment.
>
> Original (some pre-0.1 version of Thrift)
>                        ,   Object create,       Serialize,  /w Same Object,
>     Deserialize, and Check Media,   and Check All,      Total Time,
> Serialized Size
> thrift                  ,       318.93000,      4675.50000,
>  6708.00000,      5040.00000,             NaN,             NaN,
>  9715.50000,        353
> thrift                  ,       352.25500,      4556.00000,
>  6658.50000,      5086.00000,             NaN,             NaN,
>  9642.00000,        353
> thrift                  ,       350.59000,      4572.00000,
>  6748.50000,      5067.00000,             NaN,             NaN,
>  9639.00000,        353
>
> Thrift 0.2
>                        ,   Object create,       Serialize,  /w Same Object,
>     Deserialize, and Check Media,   and Check All,      Total Time,
> Serialized Size
> thrift                  ,       352.36500,      4506.00000,
>  6686.00000,      5783.00000,             NaN,             NaN,
> 10641.36500,        353
> thrift                  ,       329.13000,      4462.50000,
>  6757.00000,      5770.50000,             NaN,             NaN,
> 10562.13000,        353
> thrift                  ,       331.42000,      4503.00000,
>  6803.00000,      5768.50000,             NaN,             NaN,
> 10602.92000,        353
>
> Thrift 0.2 w/TCompactProtocol
>                        ,   Object create,       Serialize,  /w Same Object,
>     Deserialize, and Check Media,   and Check All,      Total Time,
> Serialized Size
> thrift                  ,       324.50500,      4467.00000,
>  6426.00000,      5703.00000,             NaN,             NaN,
> 10494.50500,        234
> thrift                  ,       323.00500,      4418.00000,
>  6423.00000,      5703.00000,             NaN,             NaN,
> 10444.00500,        234
> thrift                  ,       330.57000,      4388.50000,
>  5244.50000,      5704.00000,             NaN,             NaN,
> 10423.07000,        234
>
> Thrift 0.2 w/TCompactProtocol + read() fix
>                        ,   Object create,       Serialize,  /w Same Object,
>     Deserialize, and Check Media,   and Check All,      Total Time,
> Serialized Size
> thrift                  ,       327.55000,      4201.00000,
>  6163.50000,      4988.50000,             NaN,             NaN,
>  9517.05000,        234
> thrift                  ,       326.82500,      4194.00000,
>  5223.00000,      4975.50000,             NaN,             NaN,
>  9496.32500,        234
> thrift                  ,       324.36000,      4191.00000,
>  6217.00000,      4978.50000,             NaN,             NaN,
>  9493.86000,        234
>
> Thrift 0.2 w/TCompactProtocol + read() fix + enum fix
>                        ,   Object create,       Serialize,  /w Same Object,
>     Deserialize, and Check Media,   and Check All,      Total Time,
> Serialized Size
> thrift                  ,       346.90500,      4130.00000,
>  5311.00000,      4637.50000,             NaN,             NaN,
>  9114.40500,        234
> thrift                  ,       310.77500,      4168.50000,
>  5448.00000,      4718.00000,             NaN,             NaN,
>  9197.27500,        234
> thrift                  ,       314.56000,      4147.50000,
>  5540.50000,      4645.50000,             NaN,             NaN,
>  9107.56000,        234
>
>
> The "read() fix" reverts read() to use a switch on the field.id int rather
> than mapping to the enum and switching on that. The "enum fix" gets rid of
> the map in the enum classes and implements findByValue() as switch. I just
> applied these fixes by hand in the generated files.
>
> I'll go ahead and open issues on both of these items.
>
> Chad
>
> -----Original Message-----
> From: Bryan Duxbury [mailto:bryan@rapleaf.com]
> Sent: Tuesday, January 12, 2010 11:05 PM
> To: thrift-dev@incubator.apache.org
> Subject: Re: Perf regression in Java bindings?
>
> Chad -
>
> This is an interesting observation. Do you have a sense of how much of a
> difference it makes?
>
> I think you should open a ticket for this regression. Your proposed
> solutions sound viable and easy, so I can probably whip something up prett
> quickly.
>
> -Bryan
>
> On Tue, Jan 12, 2010 at 10:54 PM, Chad Walters
> <Ch...@microsoft.com>wrote:
>
> > I was playing around with upgrading some benchmarks to Thrift 0.2 and I
> > think that THRIFT-623 <https://issues.apache.org/jira/browse/THRIFT-623>
> > may have introduced a performance regression.
> >
> > Struct deserialization now does a map lookup to go from an integer to an
> > enumeration, checks for null in case the integer was not known, and then
> > does a switch on the value of the enumeration.
> >
> > But it appears to me that this mapping is entirely fixed at compile time.
> > Furthermore the enumeration does not appear to be used for anything other
> > than performing the switch (in the struct reading routine at least).
> >
> > This leads me to ask:
> > 1. Why not just switch on the integer and have a default case for unknown
> > values?
> > 2. Why have a dynamic map for a mapping that is known at compile time?
> > Wouldn't it be faster to just provide the mapping via a switch (when it
> is
> > needed at all)?
> >
> > Perhaps I am missing some subtlety but I thought I'd ask...
> >
> > Chad
> >
> >
> >
>

RE: Perf regression in Java bindings?

Posted by Chad Walters <Ch...@microsoft.com>.
I gathered the following data using the thrift-protobuf-compare benchmarks (multiple runs with each change). It's probably not very representative of any real world workload, but it's what I have to work with at the moment.

Original (some pre-0.1 version of Thrift)
                        ,   Object create,       Serialize,  /w Same Object,     Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       318.93000,      4675.50000,      6708.00000,      5040.00000,             NaN,             NaN,      9715.50000,        353
thrift                  ,       352.25500,      4556.00000,      6658.50000,      5086.00000,             NaN,             NaN,      9642.00000,        353
thrift                  ,       350.59000,      4572.00000,      6748.50000,      5067.00000,             NaN,             NaN,      9639.00000,        353

Thrift 0.2
                        ,   Object create,       Serialize,  /w Same Object,     Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       352.36500,      4506.00000,      6686.00000,      5783.00000,             NaN,             NaN,     10641.36500,        353
thrift                  ,       329.13000,      4462.50000,      6757.00000,      5770.50000,             NaN,             NaN,     10562.13000,        353
thrift                  ,       331.42000,      4503.00000,      6803.00000,      5768.50000,             NaN,             NaN,     10602.92000,        353

Thrift 0.2 w/TCompactProtocol
                        ,   Object create,       Serialize,  /w Same Object,     Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       324.50500,      4467.00000,      6426.00000,      5703.00000,             NaN,             NaN,     10494.50500,        234
thrift                  ,       323.00500,      4418.00000,      6423.00000,      5703.00000,             NaN,             NaN,     10444.00500,        234
thrift                  ,       330.57000,      4388.50000,      5244.50000,      5704.00000,             NaN,             NaN,     10423.07000,        234

Thrift 0.2 w/TCompactProtocol + read() fix 
                        ,   Object create,       Serialize,  /w Same Object,     Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       327.55000,      4201.00000,      6163.50000,      4988.50000,             NaN,             NaN,      9517.05000,        234
thrift                  ,       326.82500,      4194.00000,      5223.00000,      4975.50000,             NaN,             NaN,      9496.32500,        234
thrift                  ,       324.36000,      4191.00000,      6217.00000,      4978.50000,             NaN,             NaN,      9493.86000,        234

Thrift 0.2 w/TCompactProtocol + read() fix + enum fix
                        ,   Object create,       Serialize,  /w Same Object,     Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       346.90500,      4130.00000,      5311.00000,      4637.50000,             NaN,             NaN,      9114.40500,        234
thrift                  ,       310.77500,      4168.50000,      5448.00000,      4718.00000,             NaN,             NaN,      9197.27500,        234
thrift                  ,       314.56000,      4147.50000,      5540.50000,      4645.50000,             NaN,             NaN,      9107.56000,        234


The "read() fix" reverts read() to use a switch on the field.id int rather than mapping to the enum and switching on that. The "enum fix" gets rid of the map in the enum classes and implements findByValue() as switch. I just applied these fixes by hand in the generated files.

I'll go ahead and open issues on both of these items.

Chad

-----Original Message-----
From: Bryan Duxbury [mailto:bryan@rapleaf.com] 
Sent: Tuesday, January 12, 2010 11:05 PM
To: thrift-dev@incubator.apache.org
Subject: Re: Perf regression in Java bindings?

Chad -

This is an interesting observation. Do you have a sense of how much of a
difference it makes?

I think you should open a ticket for this regression. Your proposed
solutions sound viable and easy, so I can probably whip something up prett
quickly.

-Bryan

On Tue, Jan 12, 2010 at 10:54 PM, Chad Walters
<Ch...@microsoft.com>wrote:

> I was playing around with upgrading some benchmarks to Thrift 0.2 and I
> think that THRIFT-623 <https://issues.apache.org/jira/browse/THRIFT-623>
> may have introduced a performance regression.
>
> Struct deserialization now does a map lookup to go from an integer to an
> enumeration, checks for null in case the integer was not known, and then
> does a switch on the value of the enumeration.
>
> But it appears to me that this mapping is entirely fixed at compile time.
> Furthermore the enumeration does not appear to be used for anything other
> than performing the switch (in the struct reading routine at least).
>
> This leads me to ask:
> 1. Why not just switch on the integer and have a default case for unknown
> values?
> 2. Why have a dynamic map for a mapping that is known at compile time?
> Wouldn't it be faster to just provide the mapping via a switch (when it is
> needed at all)?
>
> Perhaps I am missing some subtlety but I thought I'd ask...
>
> Chad
>
>
>

Re: Perf regression in Java bindings?

Posted by Bryan Duxbury <br...@rapleaf.com>.
Chad -

This is an interesting observation. Do you have a sense of how much of a
difference it makes?

I think you should open a ticket for this regression. Your proposed
solutions sound viable and easy, so I can probably whip something up prett
quickly.

-Bryan

On Tue, Jan 12, 2010 at 10:54 PM, Chad Walters
<Ch...@microsoft.com>wrote:

> I was playing around with upgrading some benchmarks to Thrift 0.2 and I
> think that THRIFT-623 <https://issues.apache.org/jira/browse/THRIFT-623>
> may have introduced a performance regression.
>
> Struct deserialization now does a map lookup to go from an integer to an
> enumeration, checks for null in case the integer was not known, and then
> does a switch on the value of the enumeration.
>
> But it appears to me that this mapping is entirely fixed at compile time.
> Furthermore the enumeration does not appear to be used for anything other
> than performing the switch (in the struct reading routine at least).
>
> This leads me to ask:
> 1. Why not just switch on the integer and have a default case for unknown
> values?
> 2. Why have a dynamic map for a mapping that is known at compile time?
> Wouldn't it be faster to just provide the mapping via a switch (when it is
> needed at all)?
>
> Perhaps I am missing some subtlety but I thought I'd ask...
>
> Chad
>
>
>