You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jaime Solano <jd...@gmail.com> on 2015/02/04 19:49:41 UTC

Bulk-load data into HBase using Storm

For a proof of concept we'll be working on, we want to bulk-load data into
HBase, following a similar approach to the one explained here
<http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/>,
but with the difference that for the HFile creation (step 2 in the
mentioned article), we want to use Storm instead of MapReduce. That is, we
want to bulk load data not sitting in HDFS, but probably in memory.

   1. What are your thoughts about this? Is it feasible?
   2. What challenges do you foresee?
   3. What other approaches would you suggest?

Thanks in advance,
-Jaime

Re: Bulk-load data into HBase using Storm

Posted by Nick Dimiduk <nd...@gmail.com>.

To use existing bulk load tools, you'll need to write a valid HFile to
HDFS (have a look at HFileWriterV{2,3}) and load it into the region
server(s) using the utilities provided in LoadIncrementalHFiles.

There's no way to do this "in memory" at the moment. Closest would be to
batch up your data into a single large RPC, but that's going through the
online machinery, memstore flush, &c.

On Wed, Feb 4, 2015 at 10:49 AM, Jaime Solano <jd...@gmail.com> wrote:

> For a proof of concept we'll be working on, we want to bulk-load data into
> HBase, following a similar approach to the one explained here
> <
> http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
> >,
> but with the difference that for the HFile creation (step 2 in the
> mentioned article), we want to use Storm instead of MapReduce. That is, we
> want to bulk load data not sitting in HDFS, but probably in memory.
>
>    1. What are your thoughts about this? Is it feasible?
>    2. What challenges do you foresee?
>    3. What other approaches would you suggest?
>
> Thanks in advance,
> -Jaime
>

Re: Bulk-load data into HBase using Storm

Posted by Stack <st...@duboce.net>.

On Wed, Feb 4, 2015 at 10:49 AM, Jaime Solano <jd...@gmail.com> wrote:

> For a proof of concept we'll be working on, we want to bulk-load data into
> HBase, following a similar approach to the one explained here
> <
> http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
> >,
> but with the difference that for the HFile creation (step 2 in the
> mentioned article), we want to use Storm instead of MapReduce. That is, we
> want to bulk load data not sitting in HDFS, but probably in memory.
>
>    1. What are your thoughts about this? Is it feasible?
>    2. What challenges do you foresee?
>

Notice how the bulk load mapreduce job maintains a total order (the hfiles
have to be sorted inside themselves but then also relative to each other).
Can you have Storm do a similar total order partitioning?


>    3. What other approaches would you suggest?
>
>
Write HFiles and bulk load them if you can. Study carefully our bulk
loader. It is able to do some shoehorning if the product does not exactly
match the running hbase instance but only if all is properly sorted.

Good luck,
St.Ack



> Thanks in advance,
> -Jaime
>