You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2007/12/16 20:14:43 UTC

[jira] Updated: (DERBY-3280) Poor distribution of hash values from RecordId.hashCode()

     [ https://issues.apache.org/jira/browse/DERBY-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3280:
--------------------------------------

    Attachment: TestClient.java
                d3280.diff

The attached patch (d3280.diff) changes the hashCode() methods in RecordId and PageKey. I let NetBeans generate the methods for me, and as far as I can tell, they use the approach described in Item 8 of Effective Java (http://java.sun.com/docs/books/effective/). I made some small changes to the generated methods, like removing null checks for fields that shouldn't ever be null. With these new methods, the 11000 records mentioned above all have distinct hash codes.

To see the potential performance impact of the change, I had to make the following changes to the test client attached to DERBY-1961:
  - use 16 sets of tables to increase the rate of collisions
  - let a repeatable read transaction obtain locks on all rows before the test starts, and keep them until the test stops, so that we know there are records to collide with in the lock table

With this modified test, the patch increased the performance for 16 concurrent clients running simple joins from ~32 tx/s to ~70 tx/s. Without the patch, the profiler told me that about 50% of the CPU was spent in RecordId.equals().

Now, these numbers come from a test hand-crafted to show this particular problem, so I wouldn't expect any real-world applications to see similar improvements. Still, the test shows that, with some bad luck, one may notice a significant performance degradation because of the poor distribution of hash codes.

> Poor distribution of hash values from RecordId.hashCode()
> ---------------------------------------------------------
>
>                 Key: DERBY-3280
>                 URL: https://issues.apache.org/jira/browse/DERBY-3280
>             Project: Derby
>          Issue Type: Improvement
>          Components: Performance, Services, Store
>    Affects Versions: 10.4.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: d3280.diff, TestClient.java
>
>
> The hash values returned by RecordId.hashCode() are constructed by performing bitwise xor on the 32 least significant bits of the record number, the page number, the container id and the segment id. Since all of these values tend to be relatively small positive integers, the hash values also tend to be clustered in a very small range. This leads to a higher frequency of hash collisions in the lock table, which makes the hash tables work less efficiently and thereby reduces the performance.
> As an example, the simple join load in the test client attached to DERBY-1961 uses two tables, TENKTUP and ONEKTUP, with 10000 rows and 1000 rows, respectively. The RecordIds for these 11000 rows map to less than 900 distinct hash codes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (DERBY-3280) Poor distribution of hash values from RecordId.hashCode()

Posted by Bryan Pendleton <bp...@amberpoint.com>.

 >  Without the patch, the profiler told me that about 50% of the CPU was spent in RecordId.equals().

Wow!

This sounds like a nice change with the potential for solid
improvement. Good work.

bryan