You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/08/27 20:58:09 UTC

[GitHub] [orc] belugabehr opened a new pull request #885: ORC-974: Remove Text Key from RedBlackTree

belugabehr opened a new pull request #885:
URL: https://github.com/apache/orc/pull/885


   
   ### What changes were proposed in this pull request?
   Remove the dependency on Hadoop's Text class from RedBlackTree.  This has the added benefit of skipping the step where the incoming bytes are copied into the Text class.
   
   
   ### Why are the changes needed?
   Allows for removing dependency on Hadoop later; improve performance by removing copy.
   
   
   ### How was this patch tested?
   No functionality change. Using existing unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on a change in pull request #885: ORC-974: Remove Text Key from StringRedBlackTree

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #885:
URL: https://github.com/apache/orc/pull/885#discussion_r700241191



##########
File path: java/core/src/java/org/apache/orc/impl/StringRedBlackTree.java
##########
@@ -29,49 +30,42 @@
 public class StringRedBlackTree extends RedBlackTree implements Dictionary {
   private final DynamicByteArray byteArray = new DynamicByteArray();
   private final DynamicIntArray keyOffsets;
-  private final Text newKey = new Text();
 
   public StringRedBlackTree(int initialCapacity) {
     super(initialCapacity);
     keyOffsets = new DynamicIntArray(initialCapacity);
   }
 
+  @Deprecated
   public int add(String value) {
-    newKey.set(value);
-    return addNewKey();
-  }
-
-  private int addNewKey() {
-    // if the newKey is actually new, add it to our byteArray and store the offset & length
-    if (add()) {
-      int len = newKey.getLength();
-      keyOffsets.add(byteArray.add(newKey.getBytes(), 0, len));
-    }
-    return lastAdd;
+    byte[] b = value.getBytes(StandardCharsets.UTF_8);
+    return add(b, 0, b.length);
   }
 
+  @Deprecated
   public int add(Text value) {
-    newKey.set(value);
-    return addNewKey();
+    return add(value.getBytes(), 0, value.getLength());
   }
 
   @Override
   public int add(byte[] bytes, int offset, int length) {
-    newKey.set(bytes, offset, length);
-    return addNewKey();
+    // if the newKey is actually new, add it to our byteArray and store the offset & length
+    if (doAdd(bytes, offset, length)) {

Review comment:
       Yes.  There is already a `public` method with the same name/signature.

##########
File path: java/core/src/java/org/apache/orc/impl/RedBlackTree.java
##########
@@ -271,10 +271,14 @@ private boolean add(int node, boolean fromLeft, int parent,
 
   /**
    * Add the new key to the tree.
+   *
+   * @param bytes
+   * @param offset
+   * @param length
    * @return true if the element is a new one.
    */
-  protected boolean add() {
-    add(root, false, NULL, NULL, NULL);
+  protected boolean doAdd(byte[] bytes, int offset, int length) {

Review comment:
       Yes.  There is already a `public` method with the same name/signature.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #885: ORC-974: Remove Text Key from StringRedBlackTree

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #885:
URL: https://github.com/apache/orc/pull/885#discussion_r697784807



##########
File path: java/core/src/java/org/apache/orc/impl/RedBlackTree.java
##########
@@ -271,10 +271,14 @@ private boolean add(int node, boolean fromLeft, int parent,
 
   /**
    * Add the new key to the tree.
+   *
+   * @param bytes
+   * @param offset
+   * @param length
    * @return true if the element is a new one.
    */
-  protected boolean add() {
-    add(root, false, NULL, NULL, NULL);
+  protected boolean doAdd(byte[] bytes, int offset, int length) {

Review comment:
       Is this renamed to avoid conflicts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #885: ORC-974: Remove Text Key from StringRedBlackTree

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #885:
URL: https://github.com/apache/orc/pull/885#discussion_r697950909



##########
File path: java/core/src/java/org/apache/orc/impl/StringRedBlackTree.java
##########
@@ -29,49 +30,42 @@
 public class StringRedBlackTree extends RedBlackTree implements Dictionary {
   private final DynamicByteArray byteArray = new DynamicByteArray();
   private final DynamicIntArray keyOffsets;
-  private final Text newKey = new Text();
 
   public StringRedBlackTree(int initialCapacity) {
     super(initialCapacity);
     keyOffsets = new DynamicIntArray(initialCapacity);
   }
 
+  @Deprecated
   public int add(String value) {
-    newKey.set(value);
-    return addNewKey();
-  }
-
-  private int addNewKey() {
-    // if the newKey is actually new, add it to our byteArray and store the offset & length
-    if (add()) {
-      int len = newKey.getLength();
-      keyOffsets.add(byteArray.add(newKey.getBytes(), 0, len));
-    }
-    return lastAdd;
+    byte[] b = value.getBytes(StandardCharsets.UTF_8);
+    return add(b, 0, b.length);
   }
 
+  @Deprecated
   public int add(Text value) {
-    newKey.set(value);
-    return addNewKey();
+    return add(value.getBytes(), 0, value.getLength());
   }
 
   @Override
   public int add(byte[] bytes, int offset, int length) {
-    newKey.set(bytes, offset, length);
-    return addNewKey();
+    // if the newKey is actually new, add it to our byteArray and store the offset & length
+    if (doAdd(bytes, offset, length)) {

Review comment:
       Ya, that was my question.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #885: ORC-974: Remove Text Key from StringRedBlackTree

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #885:
URL: https://github.com/apache/orc/pull/885#discussion_r703093957



##########
File path: java/core/src/java/org/apache/orc/impl/RedBlackTree.java
##########
@@ -147,8 +147,8 @@ private void setRight(int position, int right) {
    * @param greatGrandparent Grandparent's parent
    * @return Does parent also need to be checked and/or fixed?
    */
-  private boolean add(int node, boolean fromLeft, int parent,
-                      int grandparent, int greatGrandparent) {
+    private boolean add(int node, boolean fromLeft, int parent, int grandparent,

Review comment:
       Gentle ping, @belugabehr .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #885: ORC-974: Remove Text Key from RedBlackTree

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #885:
URL: https://github.com/apache/orc/pull/885#discussion_r697784181



##########
File path: java/core/src/java/org/apache/orc/impl/RedBlackTree.java
##########
@@ -147,8 +147,8 @@ private void setRight(int position, int right) {
    * @param greatGrandparent Grandparent's parent
    * @return Does parent also need to be checked and/or fixed?
    */
-  private boolean add(int node, boolean fromLeft, int parent,
-                      int grandparent, int greatGrandparent) {
+    private boolean add(int node, boolean fromLeft, int parent, int grandparent,

Review comment:
       indentation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #885: ORC-974: Remove Text Key from StringRedBlackTree

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #885:
URL: https://github.com/apache/orc/pull/885#discussion_r697793294



##########
File path: java/core/src/java/org/apache/orc/impl/StringRedBlackTree.java
##########
@@ -29,49 +30,42 @@
 public class StringRedBlackTree extends RedBlackTree implements Dictionary {
   private final DynamicByteArray byteArray = new DynamicByteArray();
   private final DynamicIntArray keyOffsets;
-  private final Text newKey = new Text();
 
   public StringRedBlackTree(int initialCapacity) {
     super(initialCapacity);
     keyOffsets = new DynamicIntArray(initialCapacity);
   }
 
+  @Deprecated
   public int add(String value) {
-    newKey.set(value);
-    return addNewKey();
-  }
-
-  private int addNewKey() {
-    // if the newKey is actually new, add it to our byteArray and store the offset & length
-    if (add()) {
-      int len = newKey.getLength();
-      keyOffsets.add(byteArray.add(newKey.getBytes(), 0, len));
-    }
-    return lastAdd;
+    byte[] b = value.getBytes(StandardCharsets.UTF_8);
+    return add(b, 0, b.length);
   }
 
+  @Deprecated
   public int add(Text value) {
-    newKey.set(value);
-    return addNewKey();
+    return add(value.getBytes(), 0, value.getLength());
   }
 
   @Override
   public int add(byte[] bytes, int offset, int length) {
-    newKey.set(bytes, offset, length);
-    return addNewKey();
+    // if the newKey is actually new, add it to our byteArray and store the offset & length
+    if (doAdd(bytes, offset, length)) {

Review comment:
       > Is this renamed to avoid conflicts?
   
   @dongjoon-hyun, should be here to avoid recursive calls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org