You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Koert Kuipers <ko...@tresata.com> on 2013/12/07 00:06:28 UTC

HTable writeAsyncBuffer

hello all,
i was just taking a look at HTable source code to get a bit more
understanding about hbase from a client perspective.
i noticed that puts are put into a bugger (writeAsyncBuffer) that gets
flushed if it gets to a certain size.
writeAsyncBuffer can take objects of type Row, which includes besides the
Put also Deletes, Appends, and RowMutations.

but when i look at the code for the delete method it does not use
writeAsyncBuffer. same for append and mutateRow methods. why do Puts get
buffered but other mutations do not? or did i misunderstand?

thanks! koert

Re: HTable writeAsyncBuffer

Posted by Nicolas Liochon <nk...@gmail.com>.
It was written to be generic, but limited to 'put' to maintain the backward
compatibility.
Some 'Row' do not implement 'heapSize', so we have a limitation for some
types for the moment (we need the objects to implement heapSize as we need
to know how when it's time to flush the buffer). This can be solved type by
type for the ones that don't implement 'heapSize'
Server side, I'm not sure that all types are supported. it's not possible
to batch "Increment" for example. This can be solved type by type as well.

For the deletes case, I would expect that adding only deletes to the buffer
would work out of the box. It you mix puts and delete, I don't know. It's
worth trying I would say (and add a unit test in the code code if it does).
Adding delete would mean adding an option such as "autoFlushDelete", and
then using the existing buffer. It would make sense to add this
functionality to HBase imho. And this would be backward compatible as well.
You're likely to know it already, but just in case: you can already batch
deletes, and the batch mode uses the AsyncProcess code.

So:
> are there downsides if we were to add all these operations to the
writeAsyncBuffer?
> are there any usages of HTable.put that rely on it being send off instead
of being put in a buffer?
It's a user decision. If it can be controlled by a flag per HTable instance
such as 'autoFlush' today, it's fine. We can't extend autoFlush to all
operations as it would break backward compatibility, but adding flag per
operation type is fine imho.

Cheers,

Nicolas


On Tue, Dec 10, 2013 at 12:16 AM, Stack <st...@duboce.net> wrote:

> On Mon, Dec 9, 2013 at 2:57 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> > are there downsides if we were to add all these operations to the
> > writeAsyncBuffer? are there any usages of HTable.put that rely on it
> being
> > send off instead of being put in a buffer?
> >
> >
> You can ask for a flush after adding a put if you have such a case (and
> there is autoflush flag).
> St.Ack
>

Re: HTable writeAsyncBuffer

Posted by Stack <st...@duboce.net>.
On Mon, Dec 9, 2013 at 2:57 PM, Koert Kuipers <ko...@tresata.com> wrote:

> are there downsides if we were to add all these operations to the
> writeAsyncBuffer? are there any usages of HTable.put that rely on it being
> send off instead of being put in a buffer?
>
>
You can ask for a flush after adding a put if you have such a case (and
there is autoflush flag).
St.Ack

Re: HTable writeAsyncBuffer

Posted by Koert Kuipers <ko...@tresata.com>.
are there downsides if we were to add all these operations to the
writeAsyncBuffer? are there any usages of HTable.put that rely on it being
send off instead of being put in a buffer?


On Mon, Dec 9, 2013 at 5:38 PM, Stack <st...@duboce.net> wrote:

> On Sat, Dec 7, 2013 at 8:52 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
> > hey st.ack
> >
> > well i am considering creating lots of deletes from a map-reduce job
> > instead of puts, and was looking at the code to see how efficient that
> > would be...
> >
> >
>
> You  have been writing code for a while (smile)?
>
>
>
> > but now i am more generally wondering if there is any downside to making
> > all these operations go into the buffer instead of treating puts special.
> >
> >
> I'm not sure I understand the question.  If you are asking if doing mass
> individual deletes of cells is to be avoided, the answer is yes.  But maybe
> I have you wrong?
>
> St.Ack
>

Re: HTable writeAsyncBuffer

Posted by Stack <st...@duboce.net>.
On Sat, Dec 7, 2013 at 8:52 AM, Koert Kuipers <ko...@tresata.com> wrote:

> hey st.ack
>
> well i am considering creating lots of deletes from a map-reduce job
> instead of puts, and was looking at the code to see how efficient that
> would be...
>
>

You  have been writing code for a while (smile)?



> but now i am more generally wondering if there is any downside to making
> all these operations go into the buffer instead of treating puts special.
>
>
I'm not sure I understand the question.  If you are asking if doing mass
individual deletes of cells is to be avoided, the answer is yes.  But maybe
I have you wrong?

St.Ack

Re: HTable writeAsyncBuffer

Posted by Koert Kuipers <ko...@tresata.com>.
hey st.ack

well i am considering creating lots of deletes from a map-reduce job
instead of puts, and was looking at the code to see how efficient that
would be...

but now i am more generally wondering if there is any downside to making
all these operations go into the buffer instead of treating puts special.


On Sat, Dec 7, 2013 at 8:40 AM, Stack <st...@duboce.net> wrote:

> On Fri, Dec 6, 2013 at 3:06 PM, Koert Kuipers <ko...@tresata.com> wrote
>
>
> > i noticed that puts are put into a bugger (writeAsyncBuffer) that gets
> > flushed if it gets to a certain size.
> > writeAsyncBuffer can take objects of type Row, which includes besides the
> > Put also Deletes, Appends, and RowMutations.
> >
> > but when i look at the code for the delete method it does not use
> > writeAsyncBuffer. same for append and mutateRow methods. why do Puts get
> > buffered but other mutations do not? or did i misunderstand?
> >
>
>
> This is how it 'evolved'.  What are you thinking Koert?  We should probably
> be clearer in javadoc about the sequence in which these ops can go over to
> the server.
>
> Serverside, it doesn't care what is in the batch.  It will just work its
> way through the 'Rows' as they come in.
>
> St.Ack
>

Re: HTable writeAsyncBuffer

Posted by Stack <st...@duboce.net>.
On Fri, Dec 6, 2013 at 3:06 PM, Koert Kuipers <ko...@tresata.com> wrote


> i noticed that puts are put into a bugger (writeAsyncBuffer) that gets
> flushed if it gets to a certain size.
> writeAsyncBuffer can take objects of type Row, which includes besides the
> Put also Deletes, Appends, and RowMutations.
>
> but when i look at the code for the delete method it does not use
> writeAsyncBuffer. same for append and mutateRow methods. why do Puts get
> buffered but other mutations do not? or did i misunderstand?
>


This is how it 'evolved'.  What are you thinking Koert?  We should probably
be clearer in javadoc about the sequence in which these ops can go over to
the server.

Serverside, it doesn't care what is in the batch.  It will just work its
way through the 'Rows' as they come in.

St.Ack