You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "dong.yajun" <do...@gmail.com> on 2015/08/19 08:11:17 UTC

Which is the fastest way to dump the content of Hbase table?

Hello,

Which is the fastest way to dump  the content of Hbase table to Hdfs?  is
it possible to use the hbase snapshot + Spark to do this?

now we have already use the hbase snapshot + mapreduce-v2(does not via the
Htable) to convert the HFiles to OrcFile, but we found the 'spilling map
output' occupied most of whole time.  so the spark can decrease the cost?

map task: read the hfile, and convert it to KeyValues

reduce task: merge the keyvalues of same rowkey

thanks.

-- 
*Ric Dong*

Re: Which is the fastest way to dump the content of Hbase table?

Posted by Ted Yu <yu...@gmail.com>.

bq. 'spilling map output' occupied most of whole time.

Do you mind giving more detail on the above (percentage of job runtime) ?

Which release of hadoop / hbase are you using ?

Cheers

On Tue, Aug 18, 2015 at 11:11 PM, dong.yajun <do...@gmail.com> wrote:

> Hello,
>
> Which is the fastest way to dump  the content of Hbase table to Hdfs?  is
> it possible to use the hbase snapshot + Spark to do this?
>
> now we have already use the hbase snapshot + mapreduce-v2(does not via the
> Htable) to convert the HFiles to OrcFile, but we found the 'spilling map
> output' occupied most of whole time.  so the spark can decrease the cost?
>
> map task: read the hfile, and convert it to KeyValues
>
> reduce task: merge the keyvalues of same rowkey
>
> thanks.
>
> --
> *Ric Dong*
>