You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Qing Yan <qi...@gmail.com> on 2011/05/27 04:43:07 UTC

data loss after killing RS

Hello,
    I found something strange, here is the test case:
1) Process A insert data into a particular hbase region, WAL off, AutoFlush
off
2) Process A issues htable.flushCommits(), no exception thrown, write down
the row key.
4) Kill the region server manually
5) Process B query the row key, but can't find it no matter how many times
it retries. (In the meantime via
hbase UI, the region get reassigned)
Is this expected? I am using the lastest Cloudera build.

Thank you.

Re: data loss after killing RS

Posted by Qing Yan <qi...@gmail.com>.

Ok, but our app has online/realtime processing requirements. My
understanding bulk importing requires M/R job and is only good for
batch processing?

The Javadoc says HBaseAdmin flush is an async operation. How do I get
the confirmation whether it succeeded or not?

On 5/29/11, Todd Lipcon <to...@cloudera.com> wrote:
> Or actually flush the table rather than just flushing commits:
> http://archive.cloudera.com/cdh/3/hbase-0.90.1-cdh3u0/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#flush(byte[])
>
> -Todd
>
> On Sat, May 28, 2011 at 12:29 PM, Joey Echeverria <jo...@cloudera.com> wrote:
>> You might want to look into bulk loading.
>>
>> -Joey
>> On May 28, 2011 9:47 AM, "Qing Yan" <qi...@gmail.com> wrote:
>>> Well, I realized myself RS flush to HDFS is not designed to do
>>> incremental
>>> changes. So there is no way around of WAL? man..just wish it can run a
>>> bit
>>> faster:-P
>>>
>>> On Sat, May 28, 2011 at 9:36 PM, Qing Yan <qi...@gmail.com> wrote:
>>>
>>>> Ok, thanks for the explaination. so data loss is normal in this case.
>>>> Yeah , I did a "kill -9". I did wait till the RS get reassigned and
>>>> actually let process B keep retring over the night ..
>>>>
>>>> Is WAL the only way to guarantee data safety in hbase? We want high
>> insert
>>>> rate though.
>>>> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will
>> be
>>>> perfect!
>>>>
>>>>
>>>>>
>>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: data loss after killing RS

Posted by Todd Lipcon <to...@cloudera.com>.

Or actually flush the table rather than just flushing commits:
http://archive.cloudera.com/cdh/3/hbase-0.90.1-cdh3u0/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#flush(byte[])

-Todd

On Sat, May 28, 2011 at 12:29 PM, Joey Echeverria <jo...@cloudera.com> wrote:
> You might want to look into bulk loading.
>
> -Joey
> On May 28, 2011 9:47 AM, "Qing Yan" <qi...@gmail.com> wrote:
>> Well, I realized myself RS flush to HDFS is not designed to do incremental
>> changes. So there is no way around of WAL? man..just wish it can run a bit
>> faster:-P
>>
>> On Sat, May 28, 2011 at 9:36 PM, Qing Yan <qi...@gmail.com> wrote:
>>
>>> Ok, thanks for the explaination. so data loss is normal in this case.
>>> Yeah , I did a "kill -9". I did wait till the RS get reassigned and
>>> actually let process B keep retring over the night ..
>>>
>>> Is WAL the only way to guarantee data safety in hbase? We want high
> insert
>>> rate though.
>>> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will
> be
>>> perfect!
>>>
>>>
>>>>
>>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: data loss after killing RS

Posted by Joey Echeverria <jo...@cloudera.com>.

You might want to look into bulk loading.

-Joey
On May 28, 2011 9:47 AM, "Qing Yan" <qi...@gmail.com> wrote:
> Well, I realized myself RS flush to HDFS is not designed to do incremental
> changes. So there is no way around of WAL? man..just wish it can run a bit
> faster:-P
>
> On Sat, May 28, 2011 at 9:36 PM, Qing Yan <qi...@gmail.com> wrote:
>
>> Ok, thanks for the explaination. so data loss is normal in this case.
>> Yeah , I did a "kill -9". I did wait till the RS get reassigned and
>> actually let process B keep retring over the night ..
>>
>> Is WAL the only way to guarantee data safety in hbase? We want high
insert
>> rate though.
>> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will
be
>> perfect!
>>
>>
>>>
>>

Re: data loss after killing RS

Posted by Stack <st...@duboce.net>.

On Wed, Jun 1, 2011 at 7:00 AM, Qing Yan <qi...@gmail.com> wrote:
> Hello Stack,
>
> That parameter is for WAL right? I am trying to find a way to achieve
> reliable persistency in HBase without the WAL slowness, but it looks like
> impossible ..
>

You specify it when you set the schema for your table.  Try it.

Yes, syncing each write is expensive.  There is a grouping effect
going on in the dfsclient which makes it so we are not sync'ing each
write but it could be the case if for example your edits are large?
What size are they?

St.Ack

Re: data loss after killing RS

Posted by Qing Yan <qi...@gmail.com>.

Hello Stack,

That parameter is for WAL right? I am trying to find a way to achieve
reliable persistency in HBase without the WAL slowness, but it looks like
impossible ..

On Tue, May 31, 2011 at 11:07 AM, Stack <st...@duboce.net> wrote:

> Have you looked at deferred flushing?  Its an attribute you set on
> your table.  You then say how often to run sync using
> 'hbase.regionserver.optionallogflushinterval'.  Default is sync every
> second.
>
> St.Ack
>
> On Sat, May 28, 2011 at 6:47 AM, Qing Yan <qi...@gmail.com> wrote:
> > Well, I realized myself RS flush to HDFS is not designed to do
> incremental
> > changes. So there is no way around of WAL? man..just wish it can run a
> bit
> > faster:-P
> >
> > On Sat, May 28, 2011 at 9:36 PM, Qing Yan <qi...@gmail.com> wrote:
> >
> >> Ok, thanks for the explaination. so data loss is normal in this case.
> >> Yeah , I did a "kill -9".  I did wait till the RS get reassigned and
> >> actually let process B keep retring over the night ..
> >>
> >> Is WAL the only way to guarantee data safety in hbase? We want high
> insert
> >> rate though.
> >> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will
> be
> >> perfect!
> >>
> >>
> >>>
> >>
> >
>

Re: data loss after killing RS

Posted by Stack <st...@duboce.net>.

Have you looked at deferred flushing?  Its an attribute you set on
your table.  You then say how often to run sync using
'hbase.regionserver.optionallogflushinterval'.  Default is sync every
second.

St.Ack

On Sat, May 28, 2011 at 6:47 AM, Qing Yan <qi...@gmail.com> wrote:
> Well, I realized myself RS flush to HDFS is not designed to do incremental
> changes. So there is no way around of WAL? man..just wish it can run a bit
> faster:-P
>
> On Sat, May 28, 2011 at 9:36 PM, Qing Yan <qi...@gmail.com> wrote:
>
>> Ok, thanks for the explaination. so data loss is normal in this case.
>> Yeah , I did a "kill -9".  I did wait till the RS get reassigned and
>> actually let process B keep retring over the night ..
>>
>> Is WAL the only way to guarantee data safety in hbase? We want high insert
>> rate though.
>> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will be
>> perfect!
>>
>>
>>>
>>
>

Re: data loss after killing RS

Posted by Qing Yan <qi...@gmail.com>.

Well, I realized myself RS flush to HDFS is not designed to do incremental
changes. So there is no way around of WAL? man..just wish it can run a bit
faster:-P

On Sat, May 28, 2011 at 9:36 PM, Qing Yan <qi...@gmail.com> wrote:

> Ok, thanks for the explaination. so data loss is normal in this case.
> Yeah , I did a "kill -9".  I did wait till the RS get reassigned and
> actually let process B keep retring over the night ..
>
> Is WAL the only way to guarantee data safety in hbase? We want high insert
> rate though.
> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will be
> perfect!
>
>
>>
>

Re: data loss after killing RS

Posted by Qing Yan <qi...@gmail.com>.

Ok, thanks for the explaination. so data loss is normal in this case.
Yeah , I did a "kill -9".  I did wait till the RS get reassigned and
actually let process B keep retring over the night ..

Is WAL the only way to guarantee data safety in hbase? We want high insert
rate though.
Is there a middle ground? e.g. a sync operation to flush RS to HDFS will be
perfect!


>

Re: data loss after killing RS

Posted by Jean-Daniel Cryans <jd...@apache.org>.

2 things:

 - If you kill -9 before the flush happened, data isn't persisted.
It's an async operation BTW, even if you call flush and the call
returns it doesn't mean the data is already on disk.
 - About step 5, what happens when you wait for the region to be
reassigned? The default zk timeout is pretty long so that people that
didn't tune their JVM settings don't run into region servers dying all
the time.

J-D

On Thu, May 26, 2011 at 7:43 PM, Qing Yan <qi...@gmail.com> wrote:
> Hello,
>    I found something strange, here is the test case:
> 1) Process A insert data into a particular hbase region, WAL off, AutoFlush
> off
> 2) Process A issues htable.flushCommits(), no exception thrown, write down
> the row key.
> 4) Kill the region server manually
> 5) Process B query the row key, but can't find it no matter how many times
> it retries. (In the meantime via
> hbase UI, the region get reassigned)
> Is this expected? I am using the lastest Cloudera build.
>
> Thank you.
>

RE: data loss after killing RS

Posted by Friso van Vollenhoven <fv...@xebia.com>.

What do you mean by kill the region server? kill -9, unplug the power, cut the network?

When you flush commits, it means the data made it to the RS. Without WAL, it doesn't mean it made it to HDFS. It can be only in memory (memstore). When you kill the process abruptly then, it will lose the data.

A normal shutdown flushes the memstore to HDFS, I think.


Friso

________________________________________
Van: Qing Yan [qingyan@gmail.com]
Verzonden: vrijdag 27 mei 2011 4:43
Aan: user@hbase.apache.org
Onderwerp: data loss after killing RS

Hello,
    I found something strange, here is the test case:
1) Process A insert data into a particular hbase region, WAL off, AutoFlush
off
2) Process A issues htable.flushCommits(), no exception thrown, write down
the row key.
4) Kill the region server manually
5) Process B query the row key, but can't find it no matter how many times
it retries. (In the meantime via
hbase UI, the region get reassigned)
Is this expected? I am using the lastest Cloudera build.

Thank you.