You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/03/09 23:28:56 UTC

[GitHub] [orc] autumnust opened a new pull request #651: [ORC-757] HashTable dictionary

autumnust opened a new pull request #651:
URL: https://github.com/apache/orc/pull/651

This PR:
- Add a straightforward implementation for `Dictionary` using hash table.
- Refactored RB-Tree to make code reusable, like `VisitorContextImpl`. Moving it into `DictionaryUtils.java` to make is sharable between different implementation of `Dictionary` interface.
- Enabling hash-based dictionary for existing tests that enables dictionary-encoding.

### What changes were proposed in this pull request?

### Why are the changes needed?

We find RB-Tree based dictionary implementation being slow in our production workload. The performance comparison for the new hash-table based implementation will be done as part of ORC-50.

### How was this patch tested?

Mostly tested with added unit tests for hash-table and enabled hash-table based dictionary in some of the existing tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org