You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2010/09/25 01:49:38 UTC

flushing memstore when cluster shuts down WAS: empty table after flush

When HBase cluster shuts down, how is memstore flushing triggered ?

Thanks

On Fri, Sep 24, 2010 at 4:01 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> > StripedHBaseTable is just a wrapper around HTable.
>
> Ok.
>
> > QA installed another application which queries HBase. When QA didn't see
> > meaningful data from the application.
> > Why didn't HBase serve data from Memstore ?
>
> I cannot answer this question, as far as I know 0.20.6 doesn't have a
> bug that prevents you from writing data (and many people use it in
> production environments). My guess is, the data never got into HBase
> or it was written with wrong timestamps and is hidden by delete
> markers or something "time-travelly" like that. If they do play with
> timestamps, ask them not to or verify they do the right thing.
>
> > QA used stop-hbase.sh to stop HBase. Would Memstore contents be flushed
> to
> > Storefile at that moment ?
>
> Yep, if you don't see any store file then the data never made it into
> hbase.
>
> >
> > QA servers were running JVM version 1.6.0_18
>
> From
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>
> "Java 1.6.x, preferably from Sun. Use the latest version available
> except u18 (u19 is fine)."
>
> J-D
>

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Ted Yu <yu...@gmail.com>.
In HLog.splitLog(), the path of logfiles[i] should be displayed:
          } catch (IOException e) {
            if (length <= 0) {
              LOG.warn("Empty hlog, continuing: " +
logfiles[i]*.getPath()*+ " count=" + count, e);

I added some more to master log: http://pastebin.com/yGbQ2Cv5

I searched datanode log around the time EOFException was reported but didn't
see any abnormality.

On Sat, Sep 25, 2010 at 12:10 AM, Ted Yu <yu...@gmail.com> wrote:

> Stack:
> We use hadoop from cdh3b2.
>
> w.r.t. WAL log, I found the following from name node log:
> http://pastebin.com/35QYp21f
>
> Here is snippet from HBase master (on 10.32.56.155) log:
> http://pastebin.com/v5cFAKqt
>
> Looks like this was related to HLog splitting.
>
> w.r.t. the RC J-D put up today, I am considering two factors:
> 1. Our QA (and myself) are used to 0.20.6 - I need to convince them 0.89 is
> much better.
> 2. As you indicated in another thread, you would make a choice on whether
> to roll out a new release with master rewrite. I am waiting for the
> decision. If that release comes sooner, we may skip 0.89
>
> Cheers
>
>
> On Fri, Sep 24, 2010 at 11:03 PM, Stack <st...@duboce.net> wrote:
>
>> On Fri, Sep 24, 2010 at 9:06 PM, Ted Yu <yu...@gmail.com> wrote:
>> > I see this log following the previous snippet:
>> >
>> > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block null bad datanode[0] nodes == null
>> > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not
>> get
>> > block locations. Source file
>> > "/hbase/.logs/sjc9-flash-grid02.carrieriq.com
>> ,60020,1285347585107/hlog.dat.1285351187512"
>> > - Aborting...
>> > 2010-09-24 11:21:45,417 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log
>> in
>> > abort
>>
>> So we were aborting and the one thing we'll try to do on our way out
>> when aborting is close the WAL log.  Seems like that failed in the
>> above.  (This stuff is odd -- 'Recovery for block null bad datanode[0]
>> nodes == null'... anything in your datanode logs to explain this?
>> What if you grep the WAL log name in namenode log, do you see anything
>> interesting?).
>>
>> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>> on
>> > /hbase/.logs/sjc9-flash-grid02.carrieriq.com
>> ,60020,1285347585107/hlog.dat.1285351187512
>> > File does not exist. Holder DFSClient_302121899 does not have any open
>>
>>
>> Hmm... says it does not exist.
>>
>> So, yeah, for sure, check out the namenode logs.
>>
>> Hey Ted, are you fellas running 0.20.x still?  If so, what would it
>> take to get you fellas up on 0.89, say the RC J-D put up today?
>>
>>
>> > Would failure from hlog.close() lead to data loss ?
>> >
>>
>> Are you not on 0.20 hbase still?  If so, yes.  If on 0.89 with an
>> hadoop 0.20 with append support (Apache -append branch or CDH3b2),
>> then some small amount may have been lost.
>> St.Ack
>>
>
>

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Ted Yu <yu...@gmail.com>.
Stack:
We use hadoop from cdh3b2.

w.r.t. WAL log, I found the following from name node log:
http://pastebin.com/35QYp21f

Here is snippet from HBase master (on 10.32.56.155) log:
http://pastebin.com/v5cFAKqt

Looks like this was related to HLog splitting.

w.r.t. the RC J-D put up today, I am considering two factors:
1. Our QA (and myself) are used to 0.20.6 - I need to convince them 0.89 is
much better.
2. As you indicated in another thread, you would make a choice on whether to
roll out a new release with master rewrite. I am waiting for the decision.
If that release comes sooner, we may skip 0.89

Cheers

On Fri, Sep 24, 2010 at 11:03 PM, Stack <st...@duboce.net> wrote:

> On Fri, Sep 24, 2010 at 9:06 PM, Ted Yu <yu...@gmail.com> wrote:
> > I see this log following the previous snippet:
> >
> > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block null bad datanode[0] nodes == null
> > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get
> > block locations. Source file
> > "/hbase/.logs/sjc9-flash-grid02.carrieriq.com
> ,60020,1285347585107/hlog.dat.1285351187512"
> > - Aborting...
> > 2010-09-24 11:21:45,417 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log
> in
> > abort
>
> So we were aborting and the one thing we'll try to do on our way out
> when aborting is close the WAL log.  Seems like that failed in the
> above.  (This stuff is odd -- 'Recovery for block null bad datanode[0]
> nodes == null'... anything in your datanode logs to explain this?
> What if you grep the WAL log name in namenode log, do you see anything
> interesting?).
>
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> > /hbase/.logs/sjc9-flash-grid02.carrieriq.com
> ,60020,1285347585107/hlog.dat.1285351187512
> > File does not exist. Holder DFSClient_302121899 does not have any open
>
>
> Hmm... says it does not exist.
>
> So, yeah, for sure, check out the namenode logs.
>
> Hey Ted, are you fellas running 0.20.x still?  If so, what would it
> take to get you fellas up on 0.89, say the RC J-D put up today?
>
>
> > Would failure from hlog.close() lead to data loss ?
> >
>
> Are you not on 0.20 hbase still?  If so, yes.  If on 0.89 with an
> hadoop 0.20 with append support (Apache -append branch or CDH3b2),
> then some small amount may have been lost.
> St.Ack
>

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Stack <st...@duboce.net>.
On Fri, Sep 24, 2010 at 9:06 PM, Ted Yu <yu...@gmail.com> wrote:
> I see this log following the previous snippet:
>
> 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
> block locations. Source file
> "/hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512"
> - Aborting...
> 2010-09-24 11:21:45,417 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in
> abort

So we were aborting and the one thing we'll try to do on our way out
when aborting is close the WAL log.  Seems like that failed in the
above.  (This stuff is odd -- 'Recovery for block null bad datanode[0]
nodes == null'... anything in your datanode logs to explain this?
What if you grep the WAL log name in namenode log, do you see anything
interesting?).

> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512
> File does not exist. Holder DFSClient_302121899 does not have any open


Hmm... says it does not exist.

So, yeah, for sure, check out the namenode logs.

Hey Ted, are you fellas running 0.20.x still?  If so, what would it
take to get you fellas up on 0.89, say the RC J-D put up today?


> Would failure from hlog.close() lead to data loss ?
>

Are you not on 0.20 hbase still?  If so, yes.  If on 0.89 with an
hadoop 0.20 with append support (Apache -append branch or CDH3b2),
then some small amount may have been lost.
St.Ack

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Ted Yu <yu...@gmail.com>.
I see this log following the previous snippet:

2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block null bad datanode[0] nodes == null
2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
block locations. Source file
"/hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512"
- Aborting...
2010-09-24 11:21:45,417 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in
abort
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.logs/sjc9-flash-grid02.carrieriq.com,60020,1285347585107/hlog.dat.1285351187512
File does not exist. Holder DFSClient_302121899 does not have any open
files.
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1378)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1369)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1290)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:451)
    at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
    at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:613)
    at java.lang.Thread.run(Thread.java:619)

Would failure from hlog.close() lead to data loss ?

Thanks

On Fri, Sep 24, 2010 at 8:48 PM, Stack <st...@duboce.net> wrote:

> When aborting its skips flushing memstore.
>
> We're aborting because we lost our session with zookeeper.  When
> aborting we try to exit swiftly and without doing anything that could
> interfere w/ the recovery process that has probably already started.
> Since recovery process will recover our memstore edits from WAL log,
> there isn't much point completing a flush.
>
> St.Ack
>
> On Fri, Sep 24, 2010 at 5:13 PM, Ted Yu <yu...@gmail.com> wrote:
> > I think that explains why memstore wasn't flushed.
> > See snippet from region server log:
> >
> > 2010-09-24 11:21:18,603 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> > Total=6.5512085MB (6869440), Free=791.7863MB (830248128), Max=798.3375MB
> > (837117568), Counts: Blocks=1, Access=384, Hit=383, Miss=1, Evictions=0,
> > Evicted=0, Ratios: Hit Ratio=99.73958134651184%, Miss
> > Ratio=0.2604166744276881%, Evicted/Run=NaN
> > 2010-09-24 11:21:20,459 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x12b44afa1f50001 to sun.nio.ch.SelectionKeyImpl@3d3cdaa
> > java.io.IOException: TIMED OUT
> >    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> > 2010-09-24 11:21:21,512 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> > state: Disconnected, type: None, path: null
> > 2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Attempting
> > connection to server sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181
> > 2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Priming
> > connection to java.nio.channels.SocketChannel[connected local=/
> > 10.32.56.156:45455 remote=
> sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181]
> > 2010-09-24 11:21:22,873 INFO org.apache.zookeeper.ClientCnxn: Server
> > connection successful
> > 2010-09-24 11:21:23,065 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x12b44afa1f50001 to sun.nio.ch.SelectionKeyImpl@383de71c
> > java.io.IOException: Session Expired
> >    at
> >
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
> >    at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
> >    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
> > 2010-09-24 11:21:23,065 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> > state: Expired, type: None, path: null
> > 2010-09-24 11:21:23,065 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
> > expired
> > 2010-09-24 11:21:23,101 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> > request=0.0, regions=45, stores=119, storefiles=5, storefileIndexSize=0,
> > memstoreSize=7, compactionQueueSize=0, usedHeap=61, maxHeap=3991,
> > blockCacheSize=6869440, blockCacheFree=830248128, blockCacheCount=1,
> > blockCacheHitRatio=99, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> > 2010-09-24 11:21:24,457 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> > server on 60020
> >
> > Obviously this method in HRegionServer triggered abort():
> >  public void process(WatchedEvent event) {
> >
> > In this situation, why should internalFlushcache() be governed by abort ?
> >
> > Thanks
> >
> > On Fri, Sep 24, 2010 at 4:54 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> Like that:
> >>
> >>
> http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#519
> >>
> >> (this is a link to the 0.89 code, but 0.20 works exactly the same way)
> >>
> >> J-D
> >>
> >> On Fri, Sep 24, 2010 at 4:49 PM, Ted Yu <yu...@gmail.com> wrote:
> >> > When HBase cluster shuts down, how is memstore flushing triggered ?
> >> >
> >> > Thanks
> >> >
> >> > On Fri, Sep 24, 2010 at 4:01 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> > StripedHBaseTable is just a wrapper around HTable.
> >> >>
> >> >> Ok.
> >> >>
> >> >> > QA installed another application which queries HBase. When QA
> didn't
> >> see
> >> >> > meaningful data from the application.
> >> >> > Why didn't HBase serve data from Memstore ?
> >> >>
> >> >> I cannot answer this question, as far as I know 0.20.6 doesn't have a
> >> >> bug that prevents you from writing data (and many people use it in
> >> >> production environments). My guess is, the data never got into HBase
> >> >> or it was written with wrong timestamps and is hidden by delete
> >> >> markers or something "time-travelly" like that. If they do play with
> >> >> timestamps, ask them not to or verify they do the right thing.
> >> >>
> >> >> > QA used stop-hbase.sh to stop HBase. Would Memstore contents be
> >> flushed
> >> >> to
> >> >> > Storefile at that moment ?
> >> >>
> >> >> Yep, if you don't see any store file then the data never made it into
> >> >> hbase.
> >> >>
> >> >> >
> >> >> > QA servers were running JVM version 1.6.0_18
> >> >>
> >> >> From
> >> >>
> >>
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> >> >>
> >> >> "Java 1.6.x, preferably from Sun. Use the latest version available
> >> >> except u18 (u19 is fine)."
> >> >>
> >> >> J-D
> >> >>
> >> >
> >>
> >
>

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Stack <st...@duboce.net>.
When aborting its skips flushing memstore.

We're aborting because we lost our session with zookeeper.  When
aborting we try to exit swiftly and without doing anything that could
interfere w/ the recovery process that has probably already started.
Since recovery process will recover our memstore edits from WAL log,
there isn't much point completing a flush.

St.Ack

On Fri, Sep 24, 2010 at 5:13 PM, Ted Yu <yu...@gmail.com> wrote:
> I think that explains why memstore wasn't flushed.
> See snippet from region server log:
>
> 2010-09-24 11:21:18,603 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=6.5512085MB (6869440), Free=791.7863MB (830248128), Max=798.3375MB
> (837117568), Counts: Blocks=1, Access=384, Hit=383, Miss=1, Evictions=0,
> Evicted=0, Ratios: Hit Ratio=99.73958134651184%, Miss
> Ratio=0.2604166744276881%, Evicted/Run=NaN
> 2010-09-24 11:21:20,459 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12b44afa1f50001 to sun.nio.ch.SelectionKeyImpl@3d3cdaa
> java.io.IOException: TIMED OUT
>    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> 2010-09-24 11:21:21,512 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Disconnected, type: None, path: null
> 2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Attempting
> connection to server sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181
> 2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Priming
> connection to java.nio.channels.SocketChannel[connected local=/
> 10.32.56.156:45455 remote=sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181]
> 2010-09-24 11:21:22,873 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2010-09-24 11:21:23,065 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12b44afa1f50001 to sun.nio.ch.SelectionKeyImpl@383de71c
> java.io.IOException: Session Expired
>    at
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
>    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
>    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
> 2010-09-24 11:21:23,065 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Expired, type: None, path: null
> 2010-09-24 11:21:23,065 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
> expired
> 2010-09-24 11:21:23,101 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=45, stores=119, storefiles=5, storefileIndexSize=0,
> memstoreSize=7, compactionQueueSize=0, usedHeap=61, maxHeap=3991,
> blockCacheSize=6869440, blockCacheFree=830248128, blockCacheCount=1,
> blockCacheHitRatio=99, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> 2010-09-24 11:21:24,457 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
>
> Obviously this method in HRegionServer triggered abort():
>  public void process(WatchedEvent event) {
>
> In this situation, why should internalFlushcache() be governed by abort ?
>
> Thanks
>
> On Fri, Sep 24, 2010 at 4:54 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Like that:
>>
>> http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#519
>>
>> (this is a link to the 0.89 code, but 0.20 works exactly the same way)
>>
>> J-D
>>
>> On Fri, Sep 24, 2010 at 4:49 PM, Ted Yu <yu...@gmail.com> wrote:
>> > When HBase cluster shuts down, how is memstore flushing triggered ?
>> >
>> > Thanks
>> >
>> > On Fri, Sep 24, 2010 at 4:01 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >
>> >> > StripedHBaseTable is just a wrapper around HTable.
>> >>
>> >> Ok.
>> >>
>> >> > QA installed another application which queries HBase. When QA didn't
>> see
>> >> > meaningful data from the application.
>> >> > Why didn't HBase serve data from Memstore ?
>> >>
>> >> I cannot answer this question, as far as I know 0.20.6 doesn't have a
>> >> bug that prevents you from writing data (and many people use it in
>> >> production environments). My guess is, the data never got into HBase
>> >> or it was written with wrong timestamps and is hidden by delete
>> >> markers or something "time-travelly" like that. If they do play with
>> >> timestamps, ask them not to or verify they do the right thing.
>> >>
>> >> > QA used stop-hbase.sh to stop HBase. Would Memstore contents be
>> flushed
>> >> to
>> >> > Storefile at that moment ?
>> >>
>> >> Yep, if you don't see any store file then the data never made it into
>> >> hbase.
>> >>
>> >> >
>> >> > QA servers were running JVM version 1.6.0_18
>> >>
>> >> From
>> >>
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>> >>
>> >> "Java 1.6.x, preferably from Sun. Use the latest version available
>> >> except u18 (u19 is fine)."
>> >>
>> >> J-D
>> >>
>> >
>>
>

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Ted Yu <yu...@gmail.com>.
I think that explains why memstore wasn't flushed.
See snippet from region server log:

2010-09-24 11:21:18,603 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
Total=6.5512085MB (6869440), Free=791.7863MB (830248128), Max=798.3375MB
(837117568), Counts: Blocks=1, Access=384, Hit=383, Miss=1, Evictions=0,
Evicted=0, Ratios: Hit Ratio=99.73958134651184%, Miss
Ratio=0.2604166744276881%, Evicted/Run=NaN
2010-09-24 11:21:20,459 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12b44afa1f50001 to sun.nio.ch.SelectionKeyImpl@3d3cdaa
java.io.IOException: TIMED OUT
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2010-09-24 11:21:21,512 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181
2010-09-24 11:21:22,849 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
10.32.56.156:45455 remote=sjc9-flash-grid01.carrieriq.com/10.32.56.155:2181]
2010-09-24 11:21:22,873 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2010-09-24 11:21:23,065 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12b44afa1f50001 to sun.nio.ch.SelectionKeyImpl@383de71c
java.io.IOException: Session Expired
    at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2010-09-24 11:21:23,065 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Expired, type: None, path: null
2010-09-24 11:21:23,065 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
expired
2010-09-24 11:21:23,101 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=45, stores=119, storefiles=5, storefileIndexSize=0,
memstoreSize=7, compactionQueueSize=0, usedHeap=61, maxHeap=3991,
blockCacheSize=6869440, blockCacheFree=830248128, blockCacheCount=1,
blockCacheHitRatio=99, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
2010-09-24 11:21:24,457 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020

Obviously this method in HRegionServer triggered abort():
  public void process(WatchedEvent event) {

In this situation, why should internalFlushcache() be governed by abort ?

Thanks

On Fri, Sep 24, 2010 at 4:54 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Like that:
>
> http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#519
>
> (this is a link to the 0.89 code, but 0.20 works exactly the same way)
>
> J-D
>
> On Fri, Sep 24, 2010 at 4:49 PM, Ted Yu <yu...@gmail.com> wrote:
> > When HBase cluster shuts down, how is memstore flushing triggered ?
> >
> > Thanks
> >
> > On Fri, Sep 24, 2010 at 4:01 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> > StripedHBaseTable is just a wrapper around HTable.
> >>
> >> Ok.
> >>
> >> > QA installed another application which queries HBase. When QA didn't
> see
> >> > meaningful data from the application.
> >> > Why didn't HBase serve data from Memstore ?
> >>
> >> I cannot answer this question, as far as I know 0.20.6 doesn't have a
> >> bug that prevents you from writing data (and many people use it in
> >> production environments). My guess is, the data never got into HBase
> >> or it was written with wrong timestamps and is hidden by delete
> >> markers or something "time-travelly" like that. If they do play with
> >> timestamps, ask them not to or verify they do the right thing.
> >>
> >> > QA used stop-hbase.sh to stop HBase. Would Memstore contents be
> flushed
> >> to
> >> > Storefile at that moment ?
> >>
> >> Yep, if you don't see any store file then the data never made it into
> >> hbase.
> >>
> >> >
> >> > QA servers were running JVM version 1.6.0_18
> >>
> >> From
> >>
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> >>
> >> "Java 1.6.x, preferably from Sun. Use the latest version available
> >> except u18 (u19 is fine)."
> >>
> >> J-D
> >>
> >
>

Re: flushing memstore when cluster shuts down WAS: empty table after flush

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Like that:
http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#519

(this is a link to the 0.89 code, but 0.20 works exactly the same way)

J-D

On Fri, Sep 24, 2010 at 4:49 PM, Ted Yu <yu...@gmail.com> wrote:
> When HBase cluster shuts down, how is memstore flushing triggered ?
>
> Thanks
>
> On Fri, Sep 24, 2010 at 4:01 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> > StripedHBaseTable is just a wrapper around HTable.
>>
>> Ok.
>>
>> > QA installed another application which queries HBase. When QA didn't see
>> > meaningful data from the application.
>> > Why didn't HBase serve data from Memstore ?
>>
>> I cannot answer this question, as far as I know 0.20.6 doesn't have a
>> bug that prevents you from writing data (and many people use it in
>> production environments). My guess is, the data never got into HBase
>> or it was written with wrong timestamps and is hidden by delete
>> markers or something "time-travelly" like that. If they do play with
>> timestamps, ask them not to or verify they do the right thing.
>>
>> > QA used stop-hbase.sh to stop HBase. Would Memstore contents be flushed
>> to
>> > Storefile at that moment ?
>>
>> Yep, if you don't see any store file then the data never made it into
>> hbase.
>>
>> >
>> > QA servers were running JVM version 1.6.0_18
>>
>> From
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>
>> "Java 1.6.x, preferably from Sun. Use the latest version available
>> except u18 (u19 is fine)."
>>
>> J-D
>>
>