You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/12/11 22:12:28 UTC

[GitHub] [hbase] saintstack commented on a change in pull request #2747: HBASE-24850 CellComparator perf improvement

saintstack commented on a change in pull request #2747:
URL: https://github.com/apache/hbase/pull/2747#discussion_r541302819



##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -161,11 +161,11 @@ private int getTimestampOffset() {
 
   @Override
   public byte getTypeByte() {
-    return ByteBufferUtils.toByte(this.buf, this.offset + this.length - 1);
+    return getTypeByte(this.length);

Review comment:
       this.length is updated as we parse? Presumes we read Cell parts in order... first the row and then the CF and then the qualifier? We are not allowed to just and read the qualifier w/o first reading row?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -161,11 +161,11 @@ private int getTimestampOffset() {
 
   @Override
   public byte getTypeByte() {
-    return ByteBufferUtils.toByte(this.buf, this.offset + this.length - 1);
+    return getTypeByte(this.length);
   }
 
   @Override
-  public void setSequenceId(long seqId) throws IOException {
+  public void setSequenceId(long seqId) {
     throw new IllegalArgumentException("This is a key only Cell");

Review comment:
       Good.

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -292,4 +287,49 @@ public long heapSize() {
     }
     return ClassSize.align(FIXED_OVERHEAD);
   }
+
+  @Override
+  public int getFamilyLengthPosition(int rowLen) {

Review comment:
       This stuff used to be private. Making it public exposes the implementation. You fellows did a mountain of work making it so we could CHANGE the implementation. This goes back on that work? At least exposing this stuff as public methods?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -292,4 +287,49 @@ public long heapSize() {
     }
     return ClassSize.align(FIXED_OVERHEAD);
   }
+
+  @Override
+  public int getFamilyLengthPosition(int rowLen) {
+    return this.offset + Bytes.SIZEOF_SHORT + rowLen;
+  }
+
+  @Override
+  public int getFamilyInternalPosition(int familyLengthPosition) {
+    return familyLengthPosition + Bytes.SIZEOF_BYTE;
+  }
+
+  @Override
+  public int getQualifierInternalPosition(int famOffset, byte famLength) {
+    return famOffset + famLength;
+  }
+
+  private int getTimestampOffset(final int keylength) {
+    return this.offset + keylength - KeyValue.TIMESTAMP_TYPE_SIZE;
+  }
+
+  @Override
+  public long getTimestamp(int keyLength) {
+    int tsOffset = getTimestampOffset(keyLength);
+    return ByteBufferUtils.toLong(this.buf, tsOffset);
+  }
+
+  @Override
+  public byte getTypeByte(int keyLength) {
+    return ByteBufferUtils.toByte(this.buf, this.offset + keyLength - 1);
+  }
+
+  @Override
+  public int getKeyLength() {
+    return this.length;
+  }
+
+  @Override
+  public boolean equals(Object other) {
+    return super.equals(other);
+  }
+
+  @Override
+  public int hashCode() {
+    return super.hashCode();
+  }
 }

Review comment:
       Have you run w/ the jdk compile flags on to make sure all methods doing compare get inlined? I found it informative... sometimes methods would not inline or compile because too big....  Sometimes fixing this w/ refactor helped improve perf.
   
   What about that nice stack trace you showed in the issue where you showed deep trace for hbase2 in compare but a shallow one for hbase1. As you suggested in the JIRA, yes, a deep stack trace costs... trimming it would help perf too.

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparatorImpl.java
##########
@@ -51,35 +53,38 @@
    */
   public static final CellComparatorImpl COMPARATOR = new CellComparatorImpl();
 
+  private static final ContiguousCellFormatComparator contiguousCellComparator =

Review comment:
       Hmm... when would a comparator NOT be do left-to-right?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
##########
@@ -76,7 +77,7 @@
  * length and actual tag bytes length.
  */
 @InterfaceAudience.Private
-public class KeyValue implements ExtendedCell, Cloneable {
+public class KeyValue implements ExtendedCell, ContiguousCellFormat, Cloneable {

Review comment:
       Could the methods be added to ExtendedCell or is that different concerns?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparatorImpl.java
##########
@@ -51,35 +53,38 @@
    */
   public static final CellComparatorImpl COMPARATOR = new CellComparatorImpl();
 
+  private static final ContiguousCellFormatComparator contiguousCellComparator =

Review comment:
       Won't it always be a 'contiguous' left-to-right compare?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -292,4 +287,49 @@ public long heapSize() {
     }
     return ClassSize.align(FIXED_OVERHEAD);
   }
+
+  @Override
+  public int getFamilyLengthPosition(int rowLen) {
+    return this.offset + Bytes.SIZEOF_SHORT + rowLen;
+  }
+
+  @Override
+  public int getFamilyInternalPosition(int familyLengthPosition) {
+    return familyLengthPosition + Bytes.SIZEOF_BYTE;
+  }
+
+  @Override
+  public int getQualifierInternalPosition(int famOffset, byte famLength) {
+    return famOffset + famLength;
+  }
+
+  private int getTimestampOffset(final int keylength) {
+    return this.offset + keylength - KeyValue.TIMESTAMP_TYPE_SIZE;
+  }
+
+  @Override
+  public long getTimestamp(int keyLength) {
+    int tsOffset = getTimestampOffset(keyLength);
+    return ByteBufferUtils.toLong(this.buf, tsOffset);
+  }
+
+  @Override
+  public byte getTypeByte(int keyLength) {
+    return ByteBufferUtils.toByte(this.buf, this.offset + keyLength - 1);
+  }
+
+  @Override
+  public int getKeyLength() {
+    return this.length;
+  }
+
+  @Override
+  public boolean equals(Object other) {
+    return super.equals(other);

Review comment:
       Remove?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyValue.java
##########
@@ -20,41 +20,128 @@
 import java.io.IOException;
 import java.io.OutputStream;
 import java.nio.ByteBuffer;
+import java.util.List;
+
 import org.apache.hadoop.hbase.util.ByteBufferUtils;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.util.ClassSize;
 import org.apache.yetus.audience.InterfaceAudience;
 
+
 import org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
 
 /**
  * This Cell is an implementation of {@link ByteBufferExtendedCell} where the data resides in
  * off heap/ on heap ByteBuffer
  */
 @InterfaceAudience.Private
-public class ByteBufferKeyValue extends ByteBufferExtendedCell {
+public class ByteBufferKeyValue extends ByteBufferExtendedCell implements ContiguousCellFormat {

Review comment:
       I think that to introduce a new Interface here around Cells, you first need to purge two old ones... smile. Ask @anoopsjohn 

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyValue.java
##########
@@ -20,41 +20,128 @@
 import java.io.IOException;
 import java.io.OutputStream;
 import java.nio.ByteBuffer;
+import java.util.List;
+
 import org.apache.hadoop.hbase.util.ByteBufferUtils;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.util.ClassSize;
 import org.apache.yetus.audience.InterfaceAudience;
 
+
 import org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
 
 /**
  * This Cell is an implementation of {@link ByteBufferExtendedCell} where the data resides in
  * off heap/ on heap ByteBuffer
  */
 @InterfaceAudience.Private
-public class ByteBufferKeyValue extends ByteBufferExtendedCell {
+public class ByteBufferKeyValue extends ByteBufferExtendedCell implements ContiguousCellFormat {
 
-  protected final ByteBuffer buf;

Review comment:
       Why take away the final?
   
   If for default constructor, pass nulls to the override?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -292,4 +287,49 @@ public long heapSize() {
     }
     return ClassSize.align(FIXED_OVERHEAD);
   }
+
+  @Override
+  public int getFamilyLengthPosition(int rowLen) {
+    return this.offset + Bytes.SIZEOF_SHORT + rowLen;
+  }
+
+  @Override
+  public int getFamilyInternalPosition(int familyLengthPosition) {
+    return familyLengthPosition + Bytes.SIZEOF_BYTE;
+  }
+
+  @Override
+  public int getQualifierInternalPosition(int famOffset, byte famLength) {
+    return famOffset + famLength;
+  }
+
+  private int getTimestampOffset(final int keylength) {
+    return this.offset + keylength - KeyValue.TIMESTAMP_TYPE_SIZE;
+  }
+
+  @Override
+  public long getTimestamp(int keyLength) {
+    int tsOffset = getTimestampOffset(keyLength);
+    return ByteBufferUtils.toLong(this.buf, tsOffset);
+  }
+
+  @Override
+  public byte getTypeByte(int keyLength) {
+    return ByteBufferUtils.toByte(this.buf, this.offset + keyLength - 1);
+  }
+
+  @Override
+  public int getKeyLength() {
+    return this.length;
+  }
+
+  @Override
+  public boolean equals(Object other) {
+    return super.equals(other);
+  }
+
+  @Override
+  public int hashCode() {

Review comment:
       Can remove these?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparatorImpl.java
##########
@@ -51,35 +53,38 @@
    */
   public static final CellComparatorImpl COMPARATOR = new CellComparatorImpl();
 
+  private static final ContiguousCellFormatComparator contiguousCellComparator =
+      new ContiguousCellFormatComparator(COMPARATOR);
+
   @Override
   public final int compare(final Cell a, final Cell b) {
     return compare(a, b, false);
   }
 
   @Override
   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
-
     int diff = 0;
-    // "Peel off" the most common path.
-    if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
-      diff = BBKVComparator.compare((ByteBufferKeyValue)a, (ByteBufferKeyValue)b, ignoreSequenceid);
-      if (diff != 0) {
-        return diff;
-      }
+    // "Peeling off" the most common cases where the Cells backed by KV format either onheap or
+    // offheap
+    if (a instanceof ContiguousCellFormat && b instanceof ContiguousCellFormat

Review comment:
       I see. You want to use a comparator that has expectations about internals... that there methods it can call to speed up the compare.
   
   Man. We have too many if/else's in the path. if BB, if tags, if sequenceid, if offheap.... if unsafe. If ByteBufferExtendedCell...
   
   We don't yet have an implementation that is not contiguous?
   
   

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -161,11 +161,11 @@ private int getTimestampOffset() {
 
   @Override
   public byte getTypeByte() {
-    return ByteBufferUtils.toByte(this.buf, this.offset + this.length - 1);
+    return getTypeByte(this.length);

Review comment:
       But maybe this.length doesn't change? Its the key length?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ContiguousCellFormat.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more contributor license
+ * agreements. See the NOTICE file distributed with this work for additional information regarding
+ * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License. You may obtain a
+ * copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable
+ * law or agreed to in writing, software distributed under the License is distributed on an "AS IS"
+ * BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License
+ * for the specific language governing permissions and limitations under the License.
+ */
+package org.apache.hadoop.hbase;
+
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.yetus.audience.InterfaceStability;
+
+/**
+ * The interface represents any cell impl that is backed by contiguous layout like {@link KeyValue},
+ * {@link ByteBufferKeyValue}. The APIs in this interface helps to easily access the internals of
+ * the Cell structure so that we avoid fetching the offsets and lengths every time from the internal
+ * backed memory structure. We can use this interface only if it follows the standard KeyValue
+ * format. Helps in performance and this is jit friendly
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+public interface ContiguousCellFormat {
+
+  /**
+   * Return the family length position/offset.
+   * @param rowLen - the rowlength of the cell
+   * @return the family's offset position
+   */
+  int getFamilyLengthPosition(int rowLen);
+
+  /**
+   * Return the family length. Use this along with {@link #getFamilyLengthPosition(int)}
+   * @param famLengthPosition - the family offset
+   * @return the family's length
+   */
+  byte getFamilyLength(int famLengthPosition);
+
+  /**
+   * Return the family position/offset. This can be used along with
+   * the result of {@link #getFamilyLengthPosition(int)}
+   * @param familyLengthPosition - the family length position
+   * @return the family's position
+   */
+  int getFamilyInternalPosition(int familyLengthPosition);
+
+  /**
+   * Return the qualifier position/offset
+   * @param famOffset - the family offset
+   * @param famLength - the family length
+   * @return the qualifier offset/position.
+   */
+  int getQualifierInternalPosition(int famOffset, byte famLength);
+
+  /**
+   * Return the qualifier length
+   * @param keyLength - the key length
+   * @param rowLen - the row length
+   * @param famLength - the family length
+   * @return the qualifier length
+   */
+  int getQualifierLength(int keyLength, int rowLen, int famLength);
+
+  /**
+   * Return the time stamp. Use this along with {@link #getKeyLength()}
+   * @param keyLength - the key length
+   * @return the timestamp
+   */
+  long getTimestamp(int keyLength);
+
+  /**
+   * Return the type byte. Use this along with {@link #getKeyLength()}
+   * @param keyLength - the key length
+   * @return - the type byte
+   */
+  byte getTypeByte(int keyLength);
+
+  /**
+   * The keylength of the cell
+   * @return the key length
+   */
+  int getKeyLength();
+}

Review comment:
       Could this not just be internal to the Cell implementation? Does it have to be exposed like this in an Interface with special comparator?
   
   It doesn't look like it. Has to be exposed so Comparator can make use of these methods. I seen that this patch is just generalizing the behavior done when we added BBKVComparator. Argh.

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -292,4 +287,49 @@ public long heapSize() {
     }
     return ClassSize.align(FIXED_OVERHEAD);
   }
+
+  @Override
+  public int getFamilyLengthPosition(int rowLen) {

Review comment:
       For instance, what if a Cell implementation had data members that cached all lengths... a column family length data member and a row length data member, etc. These methods wouldn't make sense to it?

##########
File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ByteBufferKeyOnlyKeyValue.java
##########
@@ -292,4 +287,49 @@ public long heapSize() {
     }
     return ClassSize.align(FIXED_OVERHEAD);
   }
+
+  @Override
+  public int getFamilyLengthPosition(int rowLen) {

Review comment:
       What if this stuff just stayed internal? I think you said, I did the KV one and here you are doing BB. Do we have to do more? There'd be duplication... or we could call out to a utility class of statics so the offset math could be shared.... 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org