You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Nik Lam <ni...@gmail.com> on 2014/09/24 02:46:37 UTC

Is it safe to run multiple completebulkload jobs against the same table in parallel?

Hello,

I have a handful of large HFiles that each span a many regions on a table.

Splitting them to match the live regions is taking a very long time because
completebulkload seems to work serially through the HFiles and my regions
are undergoing splits relatively often due to organic growth - meaning the
region boundaries change while the completbulkload is in flight.

I'm wondering whether it's possible to speed up the overall bulk load of
these data by running one completebulkload job for each large HFile. I.e.
running several completebulkload jobs in parallel.

Has anyone tried this before or can anyone who is familiar with the way
completebulkload works comment on such an approach?

Regards,

Nik