You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Hiller, Dean (Contractor)" <de...@broadridge.com> on 2011/02/23 22:25:56 UTC

when does put return to the caller?

I was wonder if put returns after writing the data into memory on two
out of the three nodes letting my client continue so we don't have to
wait for the memory to then go to disk.  After all, if it is replicated,
probably don't need to wait for it to be written to disk(ie. Kind of
like the in-memory data grids that exist out there).

 

Also, is there an asynchronous request/response for PutAll so I can slam
the grid with batches of 1000 entries kind of like

 

PutAll(List<PutOps> puts, AsynchCallback cb); such that cb would be
called with the failure or success response after the put?

 

Thanks,

Dean


This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Re: when does put return to the caller?

Posted by Ryan Rawson <ry...@gmail.com>.

There is a batch put call, should be trivial to use some kind of
background thread to invoke callbacks when it returns.

Check out the HTable API, javadoc, etc.  All available via http://hbase.org !

-ryan

On Wed, Feb 23, 2011 at 1:25 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I was wonder if put returns after writing the data into memory on two
> out of the three nodes letting my client continue so we don't have to
> wait for the memory to then go to disk.  After all, if it is replicated,
> probably don't need to wait for it to be written to disk(ie. Kind of
> like the in-memory data grids that exist out there).
>
>
>
> Also, is there an asynchronous request/response for PutAll so I can slam
> the grid with batches of 1000 entries kind of like
>
>
>
> PutAll(List<PutOps> puts, AsynchCallback cb); such that cb would be
> called with the failure or success response after the put?
>
>
>
> Thanks,
>
> Dean
>
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>

Re: when does put return to the caller?

Posted by Ryan Rawson <ry...@gmail.com>.

I think we have a category error here, perhaps you should read the
bigtable paper which succinctly describes the overall architecture.
It is here: http://labs.google.com/papers/bigtable.html

Your statements dont really make sense in the context of the
bigtable/hbase architecture.  When you write, the write goes to a
single server which then writes to WAL.  There is the
deferred_log_flush option for tables to cause edits for said table to
be done in the background as well.



On Fri, Feb 25, 2011 at 1:58 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> Is there any work being done in that I could have my puts write to 3 nodes in-memory(which asynchronously write to the WAL)?  Is that a possibility?  Then if 1 or 2 nodes go down, I can still recover.  Obviously if all 3 go down, I would have a problem.
>
> Thanks,
> Dean
>
> -----Original Message-----
> From: tsuna [mailto:tsunanet@gmail.com]
> Sent: Thursday, February 24, 2011 10:05 AM
> To: user@hbase.apache.org
> Cc: Hiller, Dean (Contractor)
> Subject: Re: when does put return to the caller?
>
> On Wed, Feb 23, 2011 at 1:25 PM, Hiller, Dean  (Contractor)
> <de...@broadridge.com> wrote:
>> I was wonder if put returns after writing the data into memory on two
>> out of the three nodes letting my client continue so we don't have to
>> wait for the memory to then go to disk.  After all, if it is replicated,
>> probably don't need to wait for it to be written to disk(ie. Kind of
>> like the in-memory data grids that exist out there).
>
> If you use the WAL (Write Ahead Log), which is enabled by default,
> your write has to be persisted on 3 disks before it returns
> successfully to you.  Without the WAL, the write is only written to
> memory of one node, so if that node crashes, you'll lose your edit
> (much faster but unsafe).
>
>> Also, is there an asynchronous request/response for PutAll so I can slam
>> the grid with batches of 1000 entries kind of like
>>
>> PutAll(List<PutOps> puts, AsynchCallback cb); such that cb would be
>> called with the failure or success response after the put?
>
> HBase doesn't offer any asynchronous API out of the box, and in
> addition HTable isn't thread-safe.  If you want an asynchronous HBase
> API, I recommend you take a look at asynchbase
> (https://github.com/stumbleupon/asynchbase) as it's an alternative
> HBase client that is entirely asynchronous and non-blocking.  Javadoc
> is at http://su.pr/1PJCSY
>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>

RE: when does put return to the caller?

Posted by "Hiller, Dean (Contractor)" <de...@broadridge.com>.

Is there any work being done in that I could have my puts write to 3 nodes in-memory(which asynchronously write to the WAL)?  Is that a possibility?  Then if 1 or 2 nodes go down, I can still recover.  Obviously if all 3 go down, I would have a problem.

Thanks,
Dean

-----Original Message-----
From: tsuna [mailto:tsunanet@gmail.com] 
Sent: Thursday, February 24, 2011 10:05 AM
To: user@hbase.apache.org
Cc: Hiller, Dean (Contractor)
Subject: Re: when does put return to the caller?

On Wed, Feb 23, 2011 at 1:25 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I was wonder if put returns after writing the data into memory on two
> out of the three nodes letting my client continue so we don't have to
> wait for the memory to then go to disk.  After all, if it is replicated,
> probably don't need to wait for it to be written to disk(ie. Kind of
> like the in-memory data grids that exist out there).

If you use the WAL (Write Ahead Log), which is enabled by default,
your write has to be persisted on 3 disks before it returns
successfully to you.  Without the WAL, the write is only written to
memory of one node, so if that node crashes, you'll lose your edit
(much faster but unsafe).

> Also, is there an asynchronous request/response for PutAll so I can slam
> the grid with batches of 1000 entries kind of like
>
> PutAll(List<PutOps> puts, AsynchCallback cb); such that cb would be
> called with the failure or success response after the put?

HBase doesn't offer any asynchronous API out of the box, and in
addition HTable isn't thread-safe.  If you want an asynchronous HBase
API, I recommend you take a look at asynchbase
(https://github.com/stumbleupon/asynchbase) as it's an alternative
HBase client that is entirely asynchronous and non-blocking.  Javadoc
is at http://su.pr/1PJCSY

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com
This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.

Re: when does put return to the caller?

Posted by tsuna <ts...@gmail.com>.

On Wed, Feb 23, 2011 at 1:25 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I was wonder if put returns after writing the data into memory on two
> out of the three nodes letting my client continue so we don't have to
> wait for the memory to then go to disk.  After all, if it is replicated,
> probably don't need to wait for it to be written to disk(ie. Kind of
> like the in-memory data grids that exist out there).

If you use the WAL (Write Ahead Log), which is enabled by default,
your write has to be persisted on 3 disks before it returns
successfully to you.  Without the WAL, the write is only written to
memory of one node, so if that node crashes, you'll lose your edit
(much faster but unsafe).

> Also, is there an asynchronous request/response for PutAll so I can slam
> the grid with batches of 1000 entries kind of like
>
> PutAll(List<PutOps> puts, AsynchCallback cb); such that cb would be
> called with the failure or success response after the put?

HBase doesn't offer any asynchronous API out of the box, and in
addition HTable isn't thread-safe.  If you want an asynchronous HBase
API, I recommend you take a look at asynchbase
(https://github.com/stumbleupon/asynchbase) as it's an alternative
HBase client that is entirely asynchronous and non-blocking.  Javadoc
is at http://su.pr/1PJCSY

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com