You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Austin Heyne <ah...@ccri.com> on 2018/07/17 18:12:00 UTC
Compactions after bulk load
Hi all,
I'm trying to bulk load a large amount of data into HBase. The bulk load
succeeds but then HBase starts running compactions. My input files are
typically ~5-6GB and there are over 3k files. I've used the same table
splits for the bulk ingest and the bulk load so there should be no
reason for hbase to run any compactions. However, I'm seeing it first
start compacting the hfiles into 25+GB files and then into 200+GB files
but didn't let it run any longer. Additionally, I've talked with another
coworker who's tried this process in the past and he's experience the
same thing, eventually giving up on the feature. My attempts have been
on HBase 1.4.2. Does anyone have information on why HBase is insisting
on running these compactions or how I can stop them? They are
essentially breaking the feature for us.
Thanks,
--
Austin L. Heyne
Re: Compactions after bulk load
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Austin,
Can you share your table description? Also,was the table empty? Last, what
does your bulk data look like? I mean, how many files? One per region? Are
you 100% sure? Have you used the HFile too to validate the splits and keys
of your files?
JMS
2018-07-17 14:12 GMT-04:00 Austin Heyne <ah...@ccri.com>:
> Hi all,
>
> I'm trying to bulk load a large amount of data into HBase. The bulk load
> succeeds but then HBase starts running compactions. My input files are
> typically ~5-6GB and there are over 3k files. I've used the same table
> splits for the bulk ingest and the bulk load so there should be no reason
> for hbase to run any compactions. However, I'm seeing it first start
> compacting the hfiles into 25+GB files and then into 200+GB files but
> didn't let it run any longer. Additionally, I've talked with another
> coworker who's tried this process in the past and he's experience the same
> thing, eventually giving up on the feature. My attempts have been on HBase
> 1.4.2. Does anyone have information on why HBase is insisting on running
> these compactions or how I can stop them? They are essentially breaking the
> feature for us.
>
> Thanks,
>
> --
> Austin L. Heyne
>
>
Re: Compactions after bulk load
Posted by Austin Heyne <ah...@ccri.com>.
Thanks for the feedback, I've been slammed with other tasks but will get
to this soon as we get other things stable.
-Austin
On 07/20/2018 02:59 PM, Ted Yu wrote:
> Have you checked the output from bulk load and see if there were lines in
> the following form (from LoadIncrementalHFiles#splitStoreFile) ?
>
> LOG.info("HFile at " + hfilePath + " no longer fits inside a
> single " + "region.
> Splitting...");
>
> In the server log, you should see log in the following form:
>
> if (LOG.isDebugEnabled()) {
> LOG.debug("Compacting " + file +
> ", keycount=" + keyCount +
> ", bloomtype=" + r.getBloomFilterType().toString() +
> ", size=" + TraditionalBinaryPrefix.long2String(r.length(), "",
> 1) +
> ", encoding=" + r.getHFileReader().getDataBlockEncoding() +
> ", seqNum=" + seqNum +
> (allFiles ? ", earliestPutTs=" + earliestPutTs: ""));
> }
>
> where allFiles being true indicates major compaction.
>
> The above should give you some idea of the cause for the compaction
> activity.
>
> Thanks
>
> On Tue, Jul 17, 2018 at 11:12 AM Austin Heyne <ah...@ccri.com> wrote:
>
>> Hi all,
>>
>> I'm trying to bulk load a large amount of data into HBase. The bulk load
>> succeeds but then HBase starts running compactions. My input files are
>> typically ~5-6GB and there are over 3k files. I've used the same table
>> splits for the bulk ingest and the bulk load so there should be no
>> reason for hbase to run any compactions. However, I'm seeing it first
>> start compacting the hfiles into 25+GB files and then into 200+GB files
>> but didn't let it run any longer. Additionally, I've talked with another
>> coworker who's tried this process in the past and he's experience the
>> same thing, eventually giving up on the feature. My attempts have been
>> on HBase 1.4.2. Does anyone have information on why HBase is insisting
>> on running these compactions or how I can stop them? They are
>> essentially breaking the feature for us.
>>
>> Thanks,
>>
>> --
>> Austin L. Heyne
>>
>>
--
Austin L. Heyne
Re: Compactions after bulk load
Posted by Ted Yu <yu...@gmail.com>.
Have you checked the output from bulk load and see if there were lines in
the following form (from LoadIncrementalHFiles#splitStoreFile) ?
LOG.info("HFile at " + hfilePath + " no longer fits inside a
single " + "region.
Splitting...");
In the server log, you should see log in the following form:
if (LOG.isDebugEnabled()) {
LOG.debug("Compacting " + file +
", keycount=" + keyCount +
", bloomtype=" + r.getBloomFilterType().toString() +
", size=" + TraditionalBinaryPrefix.long2String(r.length(), "",
1) +
", encoding=" + r.getHFileReader().getDataBlockEncoding() +
", seqNum=" + seqNum +
(allFiles ? ", earliestPutTs=" + earliestPutTs: ""));
}
where allFiles being true indicates major compaction.
The above should give you some idea of the cause for the compaction
activity.
Thanks
On Tue, Jul 17, 2018 at 11:12 AM Austin Heyne <ah...@ccri.com> wrote:
> Hi all,
>
> I'm trying to bulk load a large amount of data into HBase. The bulk load
> succeeds but then HBase starts running compactions. My input files are
> typically ~5-6GB and there are over 3k files. I've used the same table
> splits for the bulk ingest and the bulk load so there should be no
> reason for hbase to run any compactions. However, I'm seeing it first
> start compacting the hfiles into 25+GB files and then into 200+GB files
> but didn't let it run any longer. Additionally, I've talked with another
> coworker who's tried this process in the past and he's experience the
> same thing, eventually giving up on the feature. My attempts have been
> on HBase 1.4.2. Does anyone have information on why HBase is insisting
> on running these compactions or how I can stop them? They are
> essentially breaking the feature for us.
>
> Thanks,
>
> --
> Austin L. Heyne
>
>