You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "calugaru.cristi" <ca...@gmail.com> on 2013/11/29 10:33:02 UTC

HBase bulk load usage

I am trying to import some HDFS data to an already existing HBase table. The
table I have was created with 2 column families, and with all the default
settings that HBase comes with when creating a new table. The table is
already filled up with a large volume of data, and it has 98 online regions.
The type of row keys it has, are under the form of(simplified version) :
2-CHARS_ID + 6-DIGIT-NUMBER + 3 X 32-CHAR-MD5-HASH.

Example of key:
IP281113ec46d86301568200d510f47095d6c99db18630b0a23ea873988b0fb12597e05cc6b30c479dfb9e9d627ccfc4c5dd5fef.

The data I want to import is on HDFS, and I am using a Map-Reduce process to
read it. I emit Put objects from my mapper, which correspond to each line
read from the HDFS files. The existing data has keys which will all start
with "XX181113". The job is configured with :

HFileOutputFormat.configureIncrementalLoad(job, hTable)
Once I start the process, I see it configured with 98 reducers (equal to the
online regions the table has), but the issue is that 4 reducers got 100% of
the data split among them, while the rest did nothing. As a result, I see
only 4 folder outputs, which have a very large size. Are these files
corresponding to 4 new regions which I can then import to the table? And if
so, why only 4, while 98 reducers get created? Reading HBase docs:

"In order to function efficiently, HFileOutputFormat must be configured such
that each output HFile fits within a single region. In order to do this,
jobs whose output will be bulk loaded into HBase use Hadoop's
TotalOrderPartitioner class to partition the map output into disjoint ranges
of the key space, corresponding to the key ranges of the regions in the
table."

confused me even more as to why I get this behaviour.

Thanks!



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-bulk-load-usage-tp4053231.html
Sent from the HBase User mailing list archive at Nabble.com.