You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2022/07/19 19:24:03 UTC

[GitHub] [hbase] bbeaudreault opened a new pull request, #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

bbeaudreault opened a new pull request, #4635:
URL: https://github.com/apache/hbase/pull/4635

   Here's an example output:
   
   ```
   Stats:
      Key length:
                  min = 29
                  max = 29
                 mean = 29
               median = 29
                 75% <= 29
                 95% <= 29
                 98% <= 29
                 99% <= 29
               99.9% <= 29
                count = 1000
                1000 <= 10
                1000 <= 100
                1000 <= 1000
                1000 <= 10000
                1000 <= 100000
                1000 <= 1000000
                1000 <= 10000000
                1000 <= 100000000
      Val length:
                  min = 3
                  max = 3
                 mean = 3
               median = 3
                 75% <= 3
                 95% <= 3
                 98% <= 3
                 99% <= 3
               99.9% <= 3
                count = 1000
                1000 <= 10
                1000 <= 100
                1000 <= 1000
                1000 <= 10000
                1000 <= 100000
                1000 <= 1000000
                1000 <= 10000000
                1000 <= 100000000
      Row size (bytes):
                  min = 40
                  max = 40
                 mean = 40
               median = 40
                 75% <= 40
                 95% <= 40
                 98% <= 40
                 99% <= 40
               99.9% <= 40
                count = 1000
                1000 <= 10
                1000 <= 100
                1000 <= 1000
                1000 <= 10000
                1000 <= 100000
                1000 <= 1000000
                1000 <= 10000000
                1000 <= 100000000
      Row size (columns):
                  min = 1
                  max = 1
                 mean = 1
               median = 1
                 75% <= 1
                 95% <= 1
                 98% <= 1
                 99% <= 1
               99.9% <= 1
                count = 1000
                1000 <= 1
                1000 <= 3
                1000 <= 5
                1000 <= 10
                1000 <= 50
                1000 <= 100
                1000 <= 500
                1000 <= 1000
                1000 <= 5000
                1000 <= 10000
   
   
   Key of biggest row: row_00000000
   ```
   
   Obviously looks weird with all the round numbers, but should give a very clear representation of the distribution once real numbers are in place.
   
   If this feels too verbose for some people, I could hide the buckets behind a config/command line option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4635:
URL: https://github.com/apache/hbase/pull/4635#issuecomment-1189650904

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  4s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 25s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 48s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 59s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 37s |  hbase-hadoop-compat in the patch passed.  |
   | +1 :green_heart: |  unit  | 222m  4s |  hbase-server in the patch passed.  |
   |  |   | 241m 48s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4635 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6ed4bdc2b878 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 075b3053cf |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/testReport/ |
   | Max. process+thread count | 2522 (vs. ulimit of 30000) |
   | modules | C: hbase-hadoop-compat hbase-server U: . |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] bbeaudreault closed pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

Posted by GitBox <gi...@apache.org>.
bbeaudreault closed pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results
URL: https://github.com/apache/hbase/pull/4635


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4635:
URL: https://github.com/apache/hbase/pull/4635#issuecomment-1189646263

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 14s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  9s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   2m 47s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  2s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 46s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 37s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  2s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  2s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 45s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   0m 43s |  hbase-hadoop-compat in the patch passed.  |
   | +1 :green_heart: |  unit  | 211m 30s |  hbase-server in the patch passed.  |
   |  |   | 232m 12s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4635 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux cdfc553122db 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 075b3053cf |
   | Default Java | AdoptOpenJDK-11.0.10+9 |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/testReport/ |
   | Max. process+thread count | 2470 (vs. ulimit of 30000) |
   | modules | C: hbase-hadoop-compat hbase-server U: . |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4635:
URL: https://github.com/apache/hbase/pull/4635#issuecomment-1189524409

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 57s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m  5s |  master passed  |
   | +1 :green_heart: |  compile  |   4m 31s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 25s |  master passed  |
   | +1 :green_heart: |  spotless  |   1m 22s |  branch has no errors when running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   3m  0s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 17s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 33s |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 28s |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 28s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   1m  1s |  hbase-server: The patch generated 1 new + 8 unchanged - 1 fixed = 9 total (was 9)  |
   | +1 :green_heart: |  whitespace  |   0m  1s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  18m 23s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.  |
   | +1 :green_heart: |  spotless  |   1m 18s |  patch has no errors when running spotless:check.  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 48s |  The patch does not generate ASF License warnings.  |
   |  |   |  57m 59s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4635 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile |
   | uname | Linux 90b6abd17b20 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 075b3053cf |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | checkstyle | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
   | Max. process+thread count | 72 (vs. ulimit of 30000) |
   | modules | C: hbase-hadoop-compat hbase-server U: . |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4635/1/console |
   | versions | git=2.17.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] bbeaudreault commented on pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

Posted by GitBox <gi...@apache.org>.
bbeaudreault commented on PR #4635:
URL: https://github.com/apache/hbase/pull/4635#issuecomment-1190308613

   Closing this PR -- going to go in a different direction.
   
   I realized that MutableRangeHistogram's buckets actually are very inaccurate on the first call to `histogram.snapshot()`. The initial bins are configured for very large ranges, and as `snapshot()` is called over time those are resized to fit the actual data based on the distribution at that time. The HistogramImpl.getQuantiles method does some complicated math to estimate the quantiles despite incorrect bins, but the `getCountAtOrBelow` does not. I thought about trying to account for that, but it seems overly complicated for somethign that is used pretty much everywhere.
   
   Instead I'm going to revert to using codahale metrics, fix to use UniformDistribution, and add some supplemental range tracking just for the HFilePrettyPrinter. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org