You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Mikhail Bautin (Created) (JIRA)" <ji...@apache.org> on 2011/09/29 23:32:45 UTC

[jira] [Created] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

HFile-level load tester with compaction and random-read workloads
-----------------------------------------------------------------

                 Key: HBASE-4516
                 URL: https://issues.apache.org/jira/browse/HBASE-4516
             Project: HBase
          Issue Type: Test
            Reporter: Mikhail Bautin
            Priority: Minor
             Fix For: 0.94.0


This is a load testing tool for HFile implementations, which supports two workloads:
- Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
- Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.

The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.

Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.

Example output for the random read workload:
Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

Posted by "Mikhail Bautin (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin reassigned HBASE-4516:
-------------------------------------

    Assignee: Mikhail Bautin
    
> HFile-level load tester with compaction and random-read workloads
> -----------------------------------------------------------------
>
>                 Key: HBASE-4516
>                 URL: https://issues.apache.org/jira/browse/HBASE-4516
>             Project: HBase
>          Issue Type: Test
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>            Priority: Minor
>             Fix For: 0.94.0
>
>
> This is a load testing tool for HFile implementations, which supports two workloads:
> - Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
> - Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
> The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
> Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
> Example output for the random read workload:
> Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117766#comment-13117766 ] 

jiraposter@reviews.apache.org commented on HBASE-4516:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2122/#review2194
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2122/#comment5113>

    I don't think you meant to include this file in your loading tool patch?  Just let me know and I can exclude on commit (maybe its part of another JIRA)?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2122/#comment5112>

    Did you intend to include this 'fix' in your loading tool patch?



src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
<https://reviews.apache.org/r/2122/#comment5114>

    FYI, going forward, this copyright line is no longer needed.


- Michael


On 2011-09-29 23:54:05, Mikhail Bautin wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2122/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 23:54:05)
bq.  
bq.  
bq.  Review request for hbase, Michael Stack and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is a load testing tool for HFile implementations, which supports two workloads:
bq.  
bq.      Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
bq.      Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
bq.  
bq.  The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
bq.  
bq.  Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
bq.  
bq.  Example output for the random read workload:
bq.  Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779
bq.  
bq.  Also refactoring and clarifying the confusing situation when a StoreFile happens to have a different Bloom filter type than what is configured for the column family.
bq.  
bq.  
bq.  This addresses bug HBASE-4516.
bq.      https://issues.apache.org/jira/browse/HBASE-4516
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2122/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Running unit tests and the load tester tool itself.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Mikhail
bq.  
bq.


                
> HFile-level load tester with compaction and random-read workloads
> -----------------------------------------------------------------
>
>                 Key: HBASE-4516
>                 URL: https://issues.apache.org/jira/browse/HBASE-4516
>             Project: HBase
>          Issue Type: Test
>            Reporter: Mikhail Bautin
>            Priority: Minor
>             Fix For: 0.94.0
>
>
> This is a load testing tool for HFile implementations, which supports two workloads:
> - Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
> - Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
> The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
> Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
> Example output for the random read workload:
> Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

Posted by "Mikhail Bautin (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin resolved HBASE-4516.
-----------------------------------

    Resolution: Fixed

Resolved as part of HBASE-4218.
                
> HFile-level load tester with compaction and random-read workloads
> -----------------------------------------------------------------
>
>                 Key: HBASE-4516
>                 URL: https://issues.apache.org/jira/browse/HBASE-4516
>             Project: HBase
>          Issue Type: Test
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>            Priority: Minor
>             Fix For: 0.94.0
>
>
> This is a load testing tool for HFile implementations, which supports two workloads:
> - Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
> - Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
> The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
> Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
> Example output for the random read workload:
> Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117761#comment-13117761 ] 

jiraposter@reviews.apache.org commented on HBASE-4516:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2122/
-----------------------------------------------------------

(Updated 2011-09-29 23:54:05.869273)


Review request for hbase, Michael Stack and Jonathan Gray.


Changes
-------

Updating summary (sorry for spam).


Summary (updated)
-------

This is a load testing tool for HFile implementations, which supports two workloads:

    Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
    Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.

The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.

Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.

Example output for the random read workload:
Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

Also refactoring and clarifying the confusing situation when a StoreFile happens to have a different Bloom filter type than what is configured for the column family.


This addresses bug HBASE-4516.
    https://issues.apache.org/jira/browse/HBASE-4516


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2122/diff


Testing
-------

Running unit tests and the load tester tool itself.


Thanks,

Mikhail


                
> HFile-level load tester with compaction and random-read workloads
> -----------------------------------------------------------------
>
>                 Key: HBASE-4516
>                 URL: https://issues.apache.org/jira/browse/HBASE-4516
>             Project: HBase
>          Issue Type: Test
>            Reporter: Mikhail Bautin
>            Priority: Minor
>             Fix For: 0.94.0
>
>
> This is a load testing tool for HFile implementations, which supports two workloads:
> - Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
> - Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
> The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
> Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
> Example output for the random read workload:
> Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117757#comment-13117757 ] 

jiraposter@reviews.apache.org commented on HBASE-4516:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2122/
-----------------------------------------------------------

Review request for hbase, Michael Stack and Jonathan Gray.


Summary
-------

This is a load testing tool for HFile implementations, which supports two workloads:

    Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
    Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.

The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.

Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.

Example output for the random read workload:
Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

Also refactoring and clarifying the confusing situation when a StoreFile happens to have a different Bloom filter type than what is configured for the column family.


This addresses bug HBASE-4516.
    https://issues.apache.org/jira/browse/HBASE-4516


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/2122/diff


Testing
-------

Running unit tests and the load tester tool itself.


Thanks,

Mikhail


                
> HFile-level load tester with compaction and random-read workloads
> -----------------------------------------------------------------
>
>                 Key: HBASE-4516
>                 URL: https://issues.apache.org/jira/browse/HBASE-4516
>             Project: HBase
>          Issue Type: Test
>            Reporter: Mikhail Bautin
>            Priority: Minor
>             Fix For: 0.94.0
>
>
> This is a load testing tool for HFile implementations, which supports two workloads:
> - Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
> - Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
> The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
> Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
> Example output for the random read workload:
> Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4516) HFile-level load tester with compaction and random-read workloads

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117767#comment-13117767 ] 

jiraposter@reviews.apache.org commented on HBASE-4516:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2122/#review2195
-----------------------------------------------------------

Ship it!


Answer my question above M and I'll exclude on commit.  Will add a little note to the 'book' too that this tool now exists.  Nice.

- Michael


On 2011-09-29 23:54:05, Mikhail Bautin wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2122/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-29 23:54:05)
bq.  
bq.  
bq.  Review request for hbase, Michael Stack and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is a load testing tool for HFile implementations, which supports two workloads:
bq.  
bq.      Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
bq.      Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
bq.  
bq.  The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
bq.  
bq.  Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
bq.  
bq.  Example output for the random read workload:
bq.  Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779
bq.  
bq.  Also refactoring and clarifying the confusing situation when a StoreFile happens to have a different Bloom filter type than what is configured for the column family.
bq.  
bq.  
bq.  This addresses bug HBASE-4516.
bq.      https://issues.apache.org/jira/browse/HBASE-4516
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2122/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Running unit tests and the load tester tool itself.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Mikhail
bq.  
bq.


                
> HFile-level load tester with compaction and random-read workloads
> -----------------------------------------------------------------
>
>                 Key: HBASE-4516
>                 URL: https://issues.apache.org/jira/browse/HBASE-4516
>             Project: HBase
>          Issue Type: Test
>            Reporter: Mikhail Bautin
>            Priority: Minor
>             Fix For: 0.94.0
>
>
> This is a load testing tool for HFile implementations, which supports two workloads:
> - Compactions (merge the input HFiles). A special case of this is only one input, which allows to do HFile format conversions.
> - Random reads. Launches the specified number of threads that do seeks and short scans on randomly generated keys.
> The original purpose of this tool was to ensure that HFile format v2 did not introduce performance regressions.
> Keys for the read workload are generated randomly between the first and the last key of the HFile. At each position, instead of precisely calculating the correct probability for every byte value b, we select a uniformly random byte between in the allowed [low, high] range. In addition, there is a heuristic that determines the positions at which the key has hex characters, and the random key contains hex characters at those positions as well.
> Example output for the random read workload:
> Time: 120 sec, seek/sec: 8290, kv/sec: 30351, kv bytes/sec: 91868121, blk/sec: 10147, unique keys: 232779

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira