You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "qihuang.zheng" <qi...@fraudmetrix.cn> on 2015/12/25 10:49:26 UTC

Re:completebulkload not mv or rename but copy and split manyattempt times

you are right. previous I bulkload one folder for experiment which is realy fast. and next time bulkload cause split takes longer.
I know why this happen: we have many txt file. and I launch each importtsv mr task for every txt file.
the result of each mr task generated ordered key-range HFile. but all HFile in global not ordered!


Our row key is md5, and total records has 100 billion, 6TB. and each original file size range from 100MB to 100GB. 
that’s why I launch many mr task parallel. and that’s problem occurred!
Although I create pre-split region with `{NUMREGIONS = 16, SPLITALGO = 'HexStringSplit’}`


the way to figure out currently is just use only one MR importtsv job.
and bulkload will reduce global ordered HFile to satisify hbase’s key-range.
and I also modify pre-split key-range to 000-fff(totally 16*16*16=4096 regions)


But as you know, original txt file is too large, not only map task number is too large, but also reduce task number large.
and this may also cause long time to finish.


Is there any way to store such huge data to hbase quickly?
I have also check cassandra and other kv store. But the first must step to read original large txt file also too slow. 






tks, qihuang.zheng


原始邮件
发件人:WangYQwangyongqiang0617@163.com
收件人:useruser@hbase.apache.org
发送时间:2015年12月23日(周三) 16:52
主题:Re:completebulkload not mv or rename but copy and split manyattempt times


this is because the table region changes, not match with the regions when you get the HFiles if the bulkload process is over, the files should be moved to hbase i think it is better to delete all hfiles and dirs when the bulkload over. At 2015-12-23 16:35:10, "qihuang.zheng" qihuang.zheng@fraudmetrix.cn wrote: I Have a HFile generate by importtsv, the file is really large, from 100mb to 10G. I have changed hbase.hregion.max.filesize to 50GB(53687091200). also specify src CanonicalServiceName same with hbase. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://tdhdfs/user/tongdun/id_hbase/1 data.md5_id2 HADOOP_CLASSPATH=`hbase classpath` hadoop jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload /user/tongdun/id_hbase/1 data.md5_id2 But both completebulkload and LoadIncrementalHFiles did't just mv/rename hfile expected. but instead copy and split hfile happening, which take long time. the logSplit occured while grouping HFiles, retry attempt XXXwill create child _tmp dir one by one level. 2015-12-23 15:52:04,909 INFO [LoadIncrementalHFiles-0] hfile.CacheConfig: CacheConfig:disabled 2015-12-23 15:52:05,006 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:52:05,007 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae no longer fits inside a single region. Splitting... 2015-12-23 15:53:38,639 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top 2015-12-23 15:53:39,173 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 1 with 2 files remaining to group or split 2015-12-23 15:53:39,186 INFO [LoadIncrementalHFiles-1] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f733d2c504f22f71b191014d72e4d124 2015-12-23 15:53:39,188 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top first=f733d2c6407f5758e860195b6d2c10c1 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:53:39,189 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top no longer fits inside a single region. Splitting... 2015-12-23 15:54:27,722 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top 2015-12-23 15:54:28,557 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 2 with 2 files remaining to group or split 2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-4] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom first=f733d2c6407f5758e860195b6d2c10c1 last=f77c7d357a76ff92bb16ec1ef79f31fb 2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top first=f77c7d3915c9a8b71c83c414aabd587d last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top no longer fits inside a single region. Splitting... 2015-12-23 15:55:08,992 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top 2015-12-23 15:55:09,424 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 3 with 2 files remaining to group or split 2015-12-23 15:55:09,431 INFO [LoadIncrementalHFiles-7] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom first=f77c7d3915c9a8b71c83c414aabd587d last=f7c525a83ee19ea166414e972c5d5541 2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top no longer fits inside a single region. Splitting... 2015-12-23 15:55:42,165 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top 2015-12-23 15:55:42,490 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 4 with 2 files remaining to group or split 2015-12-23 15:55:42,498 INFO [LoadIncrementalHFiles-10] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f80dcce8a4a14be406ddd1bdebc2eda2 2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top first=f80dccecf159d4999cb8e17446103d72 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top no longer fits inside a single region. Splitting... 2015-12-23 15:56:09,560 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top 2015-12-23 15:56:09,933 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 5 with 2 files remaining to group or split 2015-12-23 15:56:09,942 INFO [LoadIncrementalHFiles-13] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom first=f80dccecf159d4999cb8e17446103d72 last=f85673f473ead63c89e96c83b2058ca7 2015-12-23 15:56:09,943 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top first=f85673fde3138dac07ce08881c9d0ccc last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:56:09,944 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top no longer fits inside a single region. Splitting... 2015-12-23 15:56:30,890 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top 2015-12-23 15:56:31,145 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 6 with 2 files remaining to group or split 2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-16] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom first=f85673fde3138dac07ce08881c9d0ccc last=f89f12a56b5af206188639f736877563 2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top first=f89f12a59e4a9c9bcbb42d0504318e25 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top no longer fits inside a single region. Splitting... 2015-12-23 15:56:44,959 INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top 2015-12-23 15:56:46,826 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 7 with 2 files remaining to group or split 2015-12-23 15:56:46,832 INFO [LoadIncrementalHFiles-19] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom first=f89f12a59e4a9c9bcbb42d0504318e25 last=f8e7bc423ca4799459898439bf0f68b2 2015-12-23 15:56:46,833 INFO [LoadIncrementalHFiles-20] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top first=f8e7bc4bc8c2e7eac7f7e31bc116f8e0 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:56:46,930 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 2015-12-23 15:56:46,931 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3515d529acedbaa 2015-12-23 15:56:46,960 INFO [main] zookeeper.ZooKeeper: Session: 0x3515d529acedbaa closed 2015-12-23 15:56:46,960 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down even though the process finished, original hfile did't delete. I was wondering why mv/rename command not happend. [qihuang.zheng@spark047213 ~]$ hadoop fs -du -h /user/tongdun/id_hbase/1/id/ 3.3 G /user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae 6.0 G /user/tongdun/id_hbase/1/id/_tmp     tks, qihuang.zheng