You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2021/09/07 20:12:05 UTC
[orc] branch branch-1.7 updated: ORC-985. Change default for string
dictionaries back to red-black trees. (#902)
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-1.7
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/branch-1.7 by this push:
new 8b9af43 ORC-985. Change default for string dictionaries back to red-black trees. (#902)
8b9af43 is described below
commit 8b9af4350c8b016629b0c7800a42c031e766bee0
Author: Owen O'Malley <oo...@linkedin.com>
AuthorDate: Tue Sep 7 20:10:22 2021 +0000
ORC-985. Change default for string dictionaries back to red-black trees. (#902)
### What changes were proposed in this pull request?
This PR aims to chang the default for string dictionaries back to rbtree.
### Why are the changes needed?
To prevent a regression.
- The new hash table implementation in ORC-757 results in significantly larger ORC files compared to ORC 1.6.
### How was this patch tested?
Pass the CIs
(cherry picked from commit 4a03df9fdd287317673782753cf4ff8e12f7e35f)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
java/core/src/java/org/apache/orc/OrcConf.java | 2 +-
java/core/src/test/org/apache/orc/TestVectorOrcFile.java | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/java/core/src/java/org/apache/orc/OrcConf.java b/java/core/src/java/org/apache/orc/OrcConf.java
index 1f40ce9..a20753d 100644
--- a/java/core/src/java/org/apache/orc/OrcConf.java
+++ b/java/core/src/java/org/apache/orc/OrcConf.java
@@ -107,7 +107,7 @@ public enum OrcConf {
"writing first stripe. In both cases, the decision to use\n" +
"dictionary or not will be retained thereafter."),
DICTIONARY_IMPL("orc.dictionary.implementation", "orc.dictionary.implementation",
- "hash",
+ "rbtree",
"the implementation for the dictionary used for string-type column encoding.\n" +
"The choices are:\n"
+ " rbtree - use red-black tree as the implementation for the dictionary.\n"
diff --git a/java/core/src/test/org/apache/orc/TestVectorOrcFile.java b/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
index cf463df..769d1d2 100644
--- a/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
+++ b/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
@@ -713,6 +713,7 @@ public class TestVectorOrcFile {
public void testStripeLevelStatsNoForce(Version fileFormat) throws Exception {
TypeDescription schema =
TypeDescription.fromString("struct<int1:int,string1:string>");
+ OrcConf.DICTIONARY_IMPL.setString(conf, "hash");
Writer writer = OrcFile.createWriter(testFilePath,
OrcFile.writerOptions(conf)
.setSchema(schema)