You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Himanshu Gwalani (Jira)" <ji...@apache.org> on 2023/06/02 16:55:00 UTC

[jira] [Updated] (HBASE-27904) A random data generator tool leveraging bulk load.

     [ https://issues.apache.org/jira/browse/HBASE-27904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Gwalani updated HBASE-27904:
-------------------------------------
    Description: 
As of now, there is no data generator tool in HBase leveraging bulk load. Since bulk load skips client writes path, it's much faster to generate data and use of for load/performance tests where client writes are not a mandate.
Example: Any tooling over HBase that need x TBs of HBase Table for load testing.

The tool will generate data as a two-step process:
1. Generate HFiles with random data (using custom Mapper and [HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
2. Bulk load those HFiles to the respective regions of the table using [LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]

  was:
As of now, there is no data generator tool in HBase leveraging bulk load. Since bulk load skips client writes path and if an tooling over HBase need huge amout of data for load/performance testing, bulk load can be leveraged.

The tool will generate data as a two-step process:
1. Generate HFiles with random data (using custom Mapper and [HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
2. Bulk load those HFiles to the respective regions of the table using [LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]


> A random data generator tool leveraging bulk load.
> --------------------------------------------------
>
>                 Key: HBASE-27904
>                 URL: https://issues.apache.org/jira/browse/HBASE-27904
>             Project: HBase
>          Issue Type: New Feature
>          Components: util
>            Reporter: Himanshu Gwalani
>            Priority: Minor
>
> As of now, there is no data generator tool in HBase leveraging bulk load. Since bulk load skips client writes path, it's much faster to generate data and use of for load/performance tests where client writes are not a mandate.
> Example: Any tooling over HBase that need x TBs of HBase Table for load testing.
> The tool will generate data as a two-step process:
> 1. Generate HFiles with random data (using custom Mapper and [HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
> 2. Bulk load those HFiles to the respective regions of the table using [LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)