You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org> on 2011/08/04 02:37:28 UTC

[jira] [Created] (HBASE-4163) Create Split Strategy for YCSB Benchmark

Create Split Strategy for YCSB Benchmark
----------------------------------------

                 Key: HBASE-4163
                 URL: https://issues.apache.org/jira/browse/HBASE-4163
             Project: HBase
          Issue Type: Improvement
          Components: util
    Affects Versions: 0.90.3, 0.92.0
            Reporter: Nicolas Spiegelberg
            Assignee: Lars George
            Priority: Minor


Talked with Lars about how we can make it easier for users to run the YCSB benchmarks against HBase & get realistic results.  Currently, HBase is optimized for the random/uniform read/write case, which is the YCSB load.  The initial reason why we perform bad when users test against us is because they do not presplit regions & have the split ratio really low.  We need a one-line way for a user to create a table that is pre-split to 200 regions (or some decent number) by default & disable splitting.  Realistically, this is how a uniform load cluster should scale, so it's not a hack.  This will also give us a good use case to point to for how users should pre-split regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4163) Create Split Strategy for YCSB Benchmark

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079707#comment-13079707 ] 

Jean-Daniel Cryans commented on HBASE-4163:
-------------------------------------------

That's pretty clever guys.

> Create Split Strategy for YCSB Benchmark
> ----------------------------------------
>
>                 Key: HBASE-4163
>                 URL: https://issues.apache.org/jira/browse/HBASE-4163
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.90.3, 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Lars George
>            Priority: Minor
>              Labels: benchmark
>
> Talked with Lars about how we can make it easier for users to run the YCSB benchmarks against HBase & get realistic results.  Currently, HBase is optimized for the random/uniform read/write case, which is the YCSB load.  The initial reason why we perform bad when users test against us is because they do not presplit regions & have the split ratio really low.  We need a one-line way for a user to create a table that is pre-split to 200 regions (or some decent number) by default & disable splitting.  Realistically, this is how a uniform load cluster should scale, so it's not a hack.  This will also give us a good use case to point to for how users should pre-split regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4163) Create Split Strategy for YCSB Benchmark

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079153#comment-13079153 ] 

Nicolas Spiegelberg commented on HBASE-4163:
--------------------------------------------

My initial thought is to use the existing RegionSplitter utility.  We just need to create a custom SplitAlgorithm implementation class for the YCSB key specification & tell the users to run:

{code}
bin/hbase org.apache.hadoop.hbase.util.RegionSplitter TABLE -c 200 -f FAMILY -D split.algorithm=YcsbSplit
{code}

to pre-create a table with 200 regions.  To not split, we can either set hbase.hregion.max.filesize to a really high value or add a per-table split config option.

> Create Split Strategy for YCSB Benchmark
> ----------------------------------------
>
>                 Key: HBASE-4163
>                 URL: https://issues.apache.org/jira/browse/HBASE-4163
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.90.3, 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Lars George
>            Priority: Minor
>              Labels: benchmark
>
> Talked with Lars about how we can make it easier for users to run the YCSB benchmarks against HBase & get realistic results.  Currently, HBase is optimized for the random/uniform read/write case, which is the YCSB load.  The initial reason why we perform bad when users test against us is because they do not presplit regions & have the split ratio really low.  We need a one-line way for a user to create a table that is pre-split to 200 regions (or some decent number) by default & disable splitting.  Realistically, this is how a uniform load cluster should scale, so it's not a hack.  This will also give us a good use case to point to for how users should pre-split regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira