You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Graeme Wallace <gr...@farecompare.com> on 2013/10/01 02:10:18 UTC

HTableUtil and equivalent

Hi,

I've got a scenario whereby i'm pulling a stream of messages off a Kafka
topic, reformatting them and then i want to write them into HBase.

I've seen various suggestions on how to improve performance - but wondered
if there was a way whereby I could do something equivalent to HTableUtil
but without having to maintain my own lists of Puts.

Is the underlying output buffer (assuming autoFlush=false) associated with
an HTable instance or do all HTable instances share the same output buffer
? ie if it was one per HTable i could make sure that only keys in the same
region get written through that HTable.




-- 
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018

Re: HTableUtil and equivalent

Posted by takeshi <ta...@gmail.com>.
Hi,

I think your assumption is right. Each HTable instance has its own
'writeAsyncBuffer' to store all Put objects, till its 'writeBufferSize'
reaches or its 'flushCommits()' method being called.

Here is the docs,
http://hbase.apache.org/book/perf.writing.html#perf.hbase.client.autoflush

Here is the code snippet for o.a.h.hbase.client.HTable
{code:java}
public class HTable implements HTableInterface {
  ...
  @Override
  public void put(final Put put)
      throws InterruptedIOException, RetriesExhaustedWithDetailsException {
    doPut(put);
    if (autoFlush) {
      flushCommits();
    }
  }
  ...
  /**
   * Add the put to the buffer. If the buffer is already too large, sends
the buffer to the
   *  cluster.
   * @throws RetriesExhaustedWithDetailsException if there is an error on
the cluster.
   * @throws InterruptedIOException if we were interrupted.
   */
  private void doPut(Put put) throws InterruptedIOException,
RetriesExhaustedWithDetailsException {
    if (ap.hasError()){
      backgroundFlushCommits(true);
    }

    validatePut(put);

    currentWriteBufferSize += put.heapSize();
    writeAsyncBuffer.add(put);

    while (currentWriteBufferSize > writeBufferSize) {
      backgroundFlushCommits(false);
    }
  }
}
{code}



Best regards

takeshi


2013/10/1 Graeme Wallace <gr...@farecompare.com>

> Hi,
>
> I've got a scenario whereby i'm pulling a stream of messages off a Kafka
> topic, reformatting them and then i want to write them into HBase.
>
> I've seen various suggestions on how to improve performance - but wondered
> if there was a way whereby I could do something equivalent to HTableUtil
> but without having to maintain my own lists of Puts.
>
> Is the underlying output buffer (assuming autoFlush=false) associated with
> an HTable instance or do all HTable instances share the same output buffer
> ? ie if it was one per HTable i could make sure that only keys in the same
> region get written through that HTable.
>
>
>
>
> --
> Graeme Wallace
> CTO
> FareCompare.com
> O: 972 588 1414
> M: 214 681 9018
>