You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (JIRA)" <ji...@apache.org> on 2009/02/09 20:48:59 UTC

[jira] Updated: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

     [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-318:
---------------------------------

    Attachment: thrift-318.patch

This patch adds a new custom Set implementation, IntRangeSet, that collapses the values into extents of contiguous values. Then, contains(int) does 2*num extents comparisons. This proves to be faster than HashSet, likely by avoiding the Integer.valueOf autoboxing and Integer.hashcode operation. My tests show that for a variety of different value sets and query values, it's about 60% faster. 

I've also amended the java compiler to use IntRangeSet when generating enums. The struct code itself does not change.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.