You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2021/09/07 20:10:29 UTC

[orc] branch main updated: ORC-985. Change default for string dictionaries back to red-black trees. (#902)

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/main by this push:
     new 4a03df9  ORC-985. Change default for string dictionaries back to red-black trees. (#902)
4a03df9 is described below

commit 4a03df9fdd287317673782753cf4ff8e12f7e35f
Author: Owen O'Malley <oo...@linkedin.com>
AuthorDate: Tue Sep 7 20:10:22 2021 +0000

    ORC-985. Change default for string dictionaries back to red-black trees. (#902)
    
    ### What changes were proposed in this pull request?
    
    This PR aims to chang the default for string dictionaries back to rbtree.
    
    ### Why are the changes needed?
    
    To prevent a regression.
    - The new hash table implementation in ORC-757 results in significantly larger ORC files compared to ORC 1.6.
    
    ### How was this patch tested?
    
    Pass the CIs
---
 java/core/src/java/org/apache/orc/OrcConf.java           | 2 +-
 java/core/src/test/org/apache/orc/TestVectorOrcFile.java | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/java/core/src/java/org/apache/orc/OrcConf.java b/java/core/src/java/org/apache/orc/OrcConf.java
index 1f40ce9..a20753d 100644
--- a/java/core/src/java/org/apache/orc/OrcConf.java
+++ b/java/core/src/java/org/apache/orc/OrcConf.java
@@ -107,7 +107,7 @@ public enum OrcConf {
           "writing first stripe. In both cases, the decision to use\n" +
           "dictionary or not will be retained thereafter."),
   DICTIONARY_IMPL("orc.dictionary.implementation", "orc.dictionary.implementation",
-      "hash",
+      "rbtree",
       "the implementation for the dictionary used for string-type column encoding.\n" +
           "The choices are:\n"
           + " rbtree - use red-black tree as the implementation for the dictionary.\n"
diff --git a/java/core/src/test/org/apache/orc/TestVectorOrcFile.java b/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
index cf463df..769d1d2 100644
--- a/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
+++ b/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
@@ -713,6 +713,7 @@ public class TestVectorOrcFile {
   public void testStripeLevelStatsNoForce(Version fileFormat) throws Exception {
     TypeDescription schema =
         TypeDescription.fromString("struct<int1:int,string1:string>");
+    OrcConf.DICTIONARY_IMPL.setString(conf, "hash");
     Writer writer = OrcFile.createWriter(testFilePath,
         OrcFile.writerOptions(conf)
             .setSchema(schema)