You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (JIRA)" <ji...@apache.org> on 2009/02/06 23:54:59 UTC

[jira] Created: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Performance of HashSet for enumeration VALID_VALUES seems poor
--------------------------------------------------------------

                 Key: THRIFT-318
                 URL: https://issues.apache.org/jira/browse/THRIFT-318
             Project: Thrift
          Issue Type: Improvement
          Components: Compiler (Java)
            Reporter: Bryan Duxbury
            Assignee: Bryan Duxbury
            Priority: Minor
             Fix For: 0.1


It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.

I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672164#action_12672164 ] 

Bryan Duxbury commented on THRIFT-318:
--------------------------------------

It's Java 1.5 compatible. 

I considered binary searching the extents, but like you said, the common case is only one extent. I guess if you had a very disjoint set of values for a given enumeration, doing a binary search might be kind of nice. I think we should cross that bridge when someone shows up with the use case, though.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Nathan Marz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672068#action_12672068 ] 

Nathan Marz commented on THRIFT-318:
------------------------------------

Notes:

1. Make "Extent" a static class
2. I'm not crazy about the name "contains" for Extent, since it also says where the value stands in relation to the extent. Maybe it should be called "relation" or something like that?
3. Add @Override to toString()
3. It seems like this class shouldn't implement the Set interface, since most of the methods aren't implemented. I think it would be cleaner and clearer to just have it implement Iterable<Integer>



> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-318:
---------------------------------

    Attachment: thrift-318.patch

This patch adds a new custom Set implementation, IntRangeSet, that collapses the values into extents of contiguous values. Then, contains(int) does 2*num extents comparisons. This proves to be faster than HashSet, likely by avoiding the Integer.valueOf autoboxing and Integer.hashcode operation. My tests show that for a variety of different value sets and query values, it's about 60% faster. 

I've also amended the java compiler to use IntRangeSet when generating enums. The struct code itself does not change.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury resolved THRIFT-318.
----------------------------------

    Resolution: Fixed

I just committed this.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318-v2.patch, thrift-318-v3.patch, thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-318:
---------------------------------

    Attachment: thrift-318-v3.patch

Oops, forgot to add the @Override Nathan wanted.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318-v2.patch, thrift-318-v3.patch, thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-318:
---------------------------------

    Attachment: thrift-318-v2.patch

Here's an updated version of the patch that eschews the need for the Extent inner class altogether. 

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318-v2.patch, thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-318:
---------------------------------

    Patch Info: [Patch Available]

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672142#action_12672142 ] 

David Reiss commented on THRIFT-318:
------------------------------------

What is this syntax?
{code}
+  public IntRangeSet(int... values) {
{code}
is that Java 1.5 compatible?  If not, can it be replaced?

You could binary-search the extents for log(n) runtime if you want, but the common case is probably going to be a single extent, so it doesn't really matter.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-318) Performance of HashSet for enumeration VALID_VALUES seems poor

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672076#action_12672076 ] 

Bryan Duxbury commented on THRIFT-318:
--------------------------------------

I will probably get rid of the Extent subclass all together. It was used mostly as part of my first attempt that used a list of Extents instead of the extent array. 

I implemented the Set interface not just for the iterable-ness of it, but also for the conceptual guarantee that it contains no duplicates and has a contains() method. The fact that it has a bunch of unimplemented methods makes it no different than a Set wrapped with Collections.unmodifiableSet(), except that it's inherently unmodifiable.

> Performance of HashSet for enumeration VALID_VALUES seems poor
> --------------------------------------------------------------
>
>                 Key: THRIFT-318
>                 URL: https://issues.apache.org/jira/browse/THRIFT-318
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Compiler (Java)
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: thrift-318.patch
>
>
> It looks like using a HashSet for the VALID_VALUES set we now put in enumerated types was a bad move, performance-wise. There's a fair amount of HashSet/HashMap/Integer overhead generated.
> I think that the VALID_VALUES should still be a Set, but we can make a TIntRangeSet or something internal to Thrift that's more efficient for our usecases and save some CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.