You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/06/06 02:11:17 UTC

[GitHub] [hbase] busbey commented on a change in pull request #1727: HBASE-17756 We should have better introspection of HFiles

busbey commented on a change in pull request #1727:
URL: https://github.com/apache/hbase/pull/1727#discussion_r436226259



##########
File path: hbase-shaded/hbase-shaded-mapreduce/pom.xml
##########
@@ -164,6 +164,10 @@
                 <groupId>javax.servlet.jsp</groupId>
                 <artifactId>javax.servlet.jsp-api</artifactId>
               </exclusion>
+              <exclusion>
+                <groupId>org.apache.datasketches</groupId>
+                <artifactId>datasketches-java</artifactId>

Review comment:
       Won't MR jobs that have to write Hfiles fail?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
##########
@@ -212,18 +237,20 @@ public boolean parseOptions(String args[]) throws ParseException,
       Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.valueOf(hri[0]));
       String enc = HRegionInfo.encodeRegionName(rn);
       Path regionDir = new Path(tableDir, enc);
-      if (verbose)
+      if (verbose) {

Review comment:
       Skip the formatting fixes? Making it harder to track the real change

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterImpl.java
##########
@@ -798,16 +809,15 @@ protected void finishFileInfo() throws IOException {
     }
 
     // Average key length.
-    int avgKeyLen =
-        entryCount == 0 ? 0 : (int) (totalKeyLength / entryCount);
+    int avgKeyLen = (int)this.keySizeSketch.getQuantile(0.5);

Review comment:
       Please call this median and not average

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
##########
@@ -593,6 +618,33 @@ private void printMeta(HFile.Reader reader, Map<byte[], byte[]> fileInfo)
     } else {
       out.println(FOUR_SPACES + "Not present");
     }
+    DoublesSketch keySizeSketch =
+      getDoublesSketchFromMetaBlock(reader, HFileWriterImpl.KEYSIZE_SKETCH_KEY_STR);
+    printDoublesSketch(keySizeSketch, "keySize");
+    DoublesSketch valueSizeSketch =
+      getDoublesSketchFromMetaBlock(reader, HFileWriterImpl.VALUESIZE_SKETCH_KEY_STR);
+    printDoublesSketch(valueSizeSketch, "valueSize");
+  }
+
+  private void printDoublesSketch(DoublesSketch sketch, String name) {
+    double [] quantiles = sketch.getQuantiles(NORMALIZED_RANKS);

Review comment:
       Why are we asking for every single quantile?
   
   Let's provide the interquartile range  (25th and 75th quantile), the median, the 95th, and the 99th. That should give a good summary of both the norm and outliers.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterImpl.java
##########
@@ -798,16 +809,15 @@ protected void finishFileInfo() throws IOException {
     }
 
     // Average key length.
-    int avgKeyLen =
-        entryCount == 0 ? 0 : (int) (totalKeyLength / entryCount);
+    int avgKeyLen = (int)this.keySizeSketch.getQuantile(0.5);
     fileInfo.append(HFileInfo.AVG_KEY_LEN, Bytes.toBytes(avgKeyLen), false);
+    // Average value length.
+    int avgValueLen = (int)this.valueSizeSketch.getQuantile(0.5);

Review comment:
       Please call this median and not average




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org