You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Austin Heyne <ah...@ccri.com> on 2018/07/17 18:12:00 UTC

Compactions after bulk load

Hi all,

I'm trying to bulk load a large amount of data into HBase. The bulk load 
succeeds but then HBase starts running compactions. My input files are 
typically ~5-6GB and there are over 3k files. I've used the same table 
splits for the bulk ingest and the bulk load so there should be no 
reason for hbase to run any compactions. However, I'm seeing it first 
start compacting the hfiles into 25+GB files and then into 200+GB files 
but didn't let it run any longer. Additionally, I've talked with another 
coworker who's tried this process in the past and he's experience the 
same thing, eventually giving up on the feature. My attempts have been 
on HBase 1.4.2. Does anyone have information on why HBase is insisting 
on running these compactions or how I can stop them? They are 
essentially breaking the feature for us.

Thanks,

-- 
Austin L. Heyne


Re: Compactions after bulk load

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Austin,

Can you share your table description? Also,was the table empty? Last, what
does your bulk data look like? I mean, how many files? One per region? Are
you 100% sure? Have you used the HFile too to validate the splits and keys
of your files?

JMS

2018-07-17 14:12 GMT-04:00 Austin Heyne <ah...@ccri.com>:

> Hi all,
>
> I'm trying to bulk load a large amount of data into HBase. The bulk load
> succeeds but then HBase starts running compactions. My input files are
> typically ~5-6GB and there are over 3k files. I've used the same table
> splits for the bulk ingest and the bulk load so there should be no reason
> for hbase to run any compactions. However, I'm seeing it first start
> compacting the hfiles into 25+GB files and then into 200+GB files but
> didn't let it run any longer. Additionally, I've talked with another
> coworker who's tried this process in the past and he's experience the same
> thing, eventually giving up on the feature. My attempts have been on HBase
> 1.4.2. Does anyone have information on why HBase is insisting on running
> these compactions or how I can stop them? They are essentially breaking the
> feature for us.
>
> Thanks,
>
> --
> Austin L. Heyne
>
>

Re: Compactions after bulk load

Posted by Austin Heyne <ah...@ccri.com>.
Thanks for the feedback, I've been slammed with other tasks but will get 
to this soon as we get other things stable.

-Austin


On 07/20/2018 02:59 PM, Ted Yu wrote:
> Have you checked the output from bulk load and see if there were lines in
> the following form (from LoadIncrementalHFiles#splitStoreFile) ?
>
>      LOG.info("HFile at " + hfilePath + " no longer fits inside a
> single " + "region.
> Splitting...");
>
> In the server log, you should see log in the following form:
>
>        if (LOG.isDebugEnabled()) {
>          LOG.debug("Compacting " + file +
>            ", keycount=" + keyCount +
>            ", bloomtype=" + r.getBloomFilterType().toString() +
>            ", size=" + TraditionalBinaryPrefix.long2String(r.length(), "",
> 1) +
>            ", encoding=" + r.getHFileReader().getDataBlockEncoding() +
>            ", seqNum=" + seqNum +
>            (allFiles ? ", earliestPutTs=" + earliestPutTs: ""));
>        }
>
> where allFiles being true indicates major compaction.
>
> The above should give you some idea of the cause for the compaction
> activity.
>
> Thanks
>
> On Tue, Jul 17, 2018 at 11:12 AM Austin Heyne <ah...@ccri.com> wrote:
>
>> Hi all,
>>
>> I'm trying to bulk load a large amount of data into HBase. The bulk load
>> succeeds but then HBase starts running compactions. My input files are
>> typically ~5-6GB and there are over 3k files. I've used the same table
>> splits for the bulk ingest and the bulk load so there should be no
>> reason for hbase to run any compactions. However, I'm seeing it first
>> start compacting the hfiles into 25+GB files and then into 200+GB files
>> but didn't let it run any longer. Additionally, I've talked with another
>> coworker who's tried this process in the past and he's experience the
>> same thing, eventually giving up on the feature. My attempts have been
>> on HBase 1.4.2. Does anyone have information on why HBase is insisting
>> on running these compactions or how I can stop them? They are
>> essentially breaking the feature for us.
>>
>> Thanks,
>>
>> --
>> Austin L. Heyne
>>
>>

-- 
Austin L. Heyne


Re: Compactions after bulk load

Posted by Ted Yu <yu...@gmail.com>.
Have you checked the output from bulk load and see if there were lines in
the following form (from LoadIncrementalHFiles#splitStoreFile) ?

    LOG.info("HFile at " + hfilePath + " no longer fits inside a
single " + "region.
Splitting...");

In the server log, you should see log in the following form:

      if (LOG.isDebugEnabled()) {
        LOG.debug("Compacting " + file +
          ", keycount=" + keyCount +
          ", bloomtype=" + r.getBloomFilterType().toString() +
          ", size=" + TraditionalBinaryPrefix.long2String(r.length(), "",
1) +
          ", encoding=" + r.getHFileReader().getDataBlockEncoding() +
          ", seqNum=" + seqNum +
          (allFiles ? ", earliestPutTs=" + earliestPutTs: ""));
      }

where allFiles being true indicates major compaction.

The above should give you some idea of the cause for the compaction
activity.

Thanks

On Tue, Jul 17, 2018 at 11:12 AM Austin Heyne <ah...@ccri.com> wrote:

> Hi all,
>
> I'm trying to bulk load a large amount of data into HBase. The bulk load
> succeeds but then HBase starts running compactions. My input files are
> typically ~5-6GB and there are over 3k files. I've used the same table
> splits for the bulk ingest and the bulk load so there should be no
> reason for hbase to run any compactions. However, I'm seeing it first
> start compacting the hfiles into 25+GB files and then into 200+GB files
> but didn't let it run any longer. Additionally, I've talked with another
> coworker who's tried this process in the past and he's experience the
> same thing, eventually giving up on the feature. My attempts have been
> on HBase 1.4.2. Does anyone have information on why HBase is insisting
> on running these compactions or how I can stop them? They are
> essentially breaking the feature for us.
>
> Thanks,
>
> --
> Austin L. Heyne
>
>