You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2014/07/16 23:09:05 UTC

[jira] [Commented] (HBASE-2251) PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks

    [ https://issues.apache.org/jira/browse/HBASE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064096#comment-14064096 ] 

Hudson commented on HBASE-2251:
-------------------------------

FAILURE: Integrated in HBase-TRUNK #5314 (See [https://builds.apache.org/job/HBase-TRUNK/5314/])
HBASE-2251 PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks -- Add zipf distribution of cell values (stack: rev 76543f525aa920297c5a730f35b409beddb1e90b)
* hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/TestPerformanceEvaluation.java


> PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks
> ----------------------------------------------------------------------
>
>                 Key: HBASE-2251
>                 URL: https://issues.apache.org/jira/browse/HBASE-2251
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: ryan rawson
>            Assignee: stack
>              Labels: moved_from_0_20_5
>             Fix For: 0.99.0, 2.0.0
>
>         Attachments: 2251.txt
>
>
> The PerformanceEvaluation uses 1k rows, which I would argue is uncommon, and also provides an easy to hit performance goal.  Most of the harder performance issues happens at the low and high side of cell size.  In our own application, our key sizes range from 4 bytes to maybe 100 bytes.  Very rarely 1000 bytes.  If we have large values, they are VERY large, like multiple k sizes.
> Recently a change went into HBase that ran well with PE because the overhead of 1k rows is very low in memory, but under small rows, the expected performance would be hit much more.  This is because the per-value overhead (eg: node objects of the skip list/memstore) is amortized more with 1k values. 
> We should make this a tunable setting, and have a low default.  I would argue for a 10-30 byte default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)