You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2021/09/07 20:12:05 UTC

[orc] branch branch-1.7 updated: ORC-985. Change default for string dictionaries back to red-black trees. (#902)

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-1.7
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-1.7 by this push:
     new 8b9af43  ORC-985. Change default for string dictionaries back to red-black trees. (#902)
8b9af43 is described below

commit 8b9af4350c8b016629b0c7800a42c031e766bee0
Author: Owen O'Malley <oo...@linkedin.com>
AuthorDate: Tue Sep 7 20:10:22 2021 +0000

    ORC-985. Change default for string dictionaries back to red-black trees. (#902)
    
    ### What changes were proposed in this pull request?
    
    This PR aims to chang the default for string dictionaries back to rbtree.
    
    ### Why are the changes needed?
    
    To prevent a regression.
    - The new hash table implementation in ORC-757 results in significantly larger ORC files compared to ORC 1.6.
    
    ### How was this patch tested?
    
    Pass the CIs
    
    (cherry picked from commit 4a03df9fdd287317673782753cf4ff8e12f7e35f)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 java/core/src/java/org/apache/orc/OrcConf.java           | 2 +-
 java/core/src/test/org/apache/orc/TestVectorOrcFile.java | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/java/core/src/java/org/apache/orc/OrcConf.java b/java/core/src/java/org/apache/orc/OrcConf.java
index 1f40ce9..a20753d 100644
--- a/java/core/src/java/org/apache/orc/OrcConf.java
+++ b/java/core/src/java/org/apache/orc/OrcConf.java
@@ -107,7 +107,7 @@ public enum OrcConf {
           "writing first stripe. In both cases, the decision to use\n" +
           "dictionary or not will be retained thereafter."),
   DICTIONARY_IMPL("orc.dictionary.implementation", "orc.dictionary.implementation",
-      "hash",
+      "rbtree",
       "the implementation for the dictionary used for string-type column encoding.\n" +
           "The choices are:\n"
           + " rbtree - use red-black tree as the implementation for the dictionary.\n"
diff --git a/java/core/src/test/org/apache/orc/TestVectorOrcFile.java b/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
index cf463df..769d1d2 100644
--- a/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
+++ b/java/core/src/test/org/apache/orc/TestVectorOrcFile.java
@@ -713,6 +713,7 @@ public class TestVectorOrcFile {
   public void testStripeLevelStatsNoForce(Version fileFormat) throws Exception {
     TypeDescription schema =
         TypeDescription.fromString("struct<int1:int,string1:string>");
+    OrcConf.DICTIONARY_IMPL.setString(conf, "hash");
     Writer writer = OrcFile.createWriter(testFilePath,
         OrcFile.writerOptions(conf)
             .setSchema(schema)