You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by charan kumar <ch...@gmail.com> on 2011/01/12 21:17:53 UTC

Region Server shutdown (Replay HLOg required)

Hello,

  Region servers are dying printing the following exception, under heavy
write load. Let me know, if you need any more details. Your help is greatly
appreciated.

 Environment:
   HBase (0.20.6) setup. (30 nodes/region servers)   LZO Compression enabled
in Hbase

Region Server log entry:

2011-01-11 16:00:27,489 FATAL
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
required. Forcing server shutdown

org.apache.hadoop.hbase.DroppedSnapshotException: region:
webtable,XXXXXXXX1294790230320

        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1041)

        at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:896)

        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:258)

        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:231)

        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:154)

Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could
not complete write to file /hbase/webtable/1138778035/c/4254248379246402147
by DFSClient_535678138

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)

        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)

        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)

        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:396)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)



        at org.apache.hadoop.ipc.Client.call(Client.java:740)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)

        at $Proxy1.complete(Unknown Source)

        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

        at $Proxy1.complete(Unknown Source)

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)

        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)

        at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)

        at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:620)

        at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:554)

        at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:516)

        at
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:88)

        at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1597)

        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1000)

        ... 4 more


Namenode log exception around the same time:



11/01/12 07:01:36 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
failed to complete /hbase/webtable/689554504/c/3522097247441132047 because
dir.getFileBlocks() is null  and pendingFile is null
11/01/12 07:01:36 INFO ipc.Server: IPC Server handler 30 on 8020, call
complete(/hbase/webtable/689554504/c/3522097247441132047,
DFSClient_1190023263) from XXXXXXX:52636: error: java.io.IOException: Could
not complete write to file /
hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
java.io.IOException: Could not complete write to file
/hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


Thanks,

Charan

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
Any suggestions on this?

On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com>wrote:

> Hello,
>
>   Region servers are dying printing the following exception, under heavy
> write load. Let me know, if you need any more details. Your help is greatly
> appreciated.
>
>  Environment:
>    HBase (0.20.6) setup. (30 nodes/region servers)   LZO Compression
> enabled in Hbase
>
> Region Server log entry:
>
> 2011-01-11 16:00:27,489 FATAL
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
> required. Forcing server shutdown
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> webtable,XXXXXXXX1294790230320
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1041)
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:896)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:258)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:231)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:154)
>
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> Could not complete write to file
> /hbase/webtable/1138778035/c/4254248379246402147 by DFSClient_535678138
>
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
>
>         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>
>         at $Proxy1.complete(Unknown Source)
>
>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
>         at $Proxy1.complete(Unknown Source)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
>
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>
>         at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:620)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:554)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:516)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:88)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1597)
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1000)
>
>         ... 4 more
>
>
> Namenode log exception around the same time:
>
>
>
> 11/01/12 07:01:36 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
> failed to complete /hbase/webtable/689554504/c/3522097247441132047 because
> dir.getFileBlocks() is null  and pendingFile is null
> 11/01/12 07:01:36 INFO ipc.Server: IPC Server handler 30 on 8020, call
> complete(/hbase/webtable/689554504/c/3522097247441132047,
> DFSClient_1190023263) from XXXXXXX:52636: error: java.io.IOException: Could
> not complete write to file /
> hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
> java.io.IOException: Could not complete write to file
> /hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
>         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
> Thanks,
>
> Charan
>
>
>

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
Stack,

   After upgrading to hbase-0.90.0, I was able to load around 56 Million
Rows without any crashes. Looks like this has to do something with
hbase-0.20.6 and hadoop-0.20.2, not sure what it is though.

Thanks,
Charan

On Thu, Jan 20, 2011 at 11:06 AM, charan kumar <ch...@gmail.com>wrote:

> Hi StAck,
>
>  Thanks for your reply.
>
>  ulimit is set to 65535 and xcievers to 4096. I dont see related exceptions
> in the log.
>
>  We require the collation from MR, so I cannot put the persist logic in
> Map. How ever I did do hashing on the key, so that hadoop sorts in a
> different way than hbase, which should distribute the load...
>
>  Even then the key ranges are so narrow, I have noticed most of the
> requests ending up on one region server,and eventually that one crashes.
>
> I couldn't prove that there was double allocation for the region.
>
> Latest import I tried with hadoop-0.20-append branch, It sustained little
> longer than with previous branch. This time it is again the region server
> which is taking the load is crashing. The stack trace is as below.
>
> 2011-01-19 17:41:39,405 FATAL
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
> required. Forcing server shutdown
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region: webtable,
> http://www.ask.com/local?qsrc=&o=0&l=dir&what=recreation+%26+sporting+goods+in+Moraga%2C+CA+in+Oakland+CA+in+Oakland+CA+in+Amsterdam+NH+in+Oakland+CA+in+Oakland+CA+in+Oakland+CA+in+Oakland+CA&where=Oakland+CA,1295487470890
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1041)
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:896)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:258)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:231)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:154)
>
> Caused by: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
> /hbase/webtable/685194390/c/4071023759080531376 File does not exist.
> [Lease.  Holder: DFSClient_-354478469, pendingcreates: 2]
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1431)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1422)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1477)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1465)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)
>
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>
>         at $Proxy1.complete(Unknown Source)
>
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
>         at $Proxy1.complete(Unknown Source)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3458)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3382)
>
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>
>         at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:620)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:554)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:516)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:88)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1597)
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1000)
>
>         ... 4 more
>
> Please advise.
>
> I am not sure, if it matters, but I am testing with hbase-0.90 to see, if
> it makes any difference.
>
>
> Thanks,
> Charan
>
>
> On Sat, Jan 15, 2011 at 5:36 PM, Stack <st...@duboce.net> wrote:
>
>> On Fri, Jan 14, 2011 at 11:36 AM, charan kumar <ch...@gmail.com>
>> wrote:
>> > Could this be an effect of hotspotting? Since I am persisting millions
>> of
>> > keys through MR (which are URLS) and they are already sorted.
>> >
>>
>> I don't think so.  HBase usually buckles first.  For sure you've upped
>> ulimit and xceivers as per HBase requirements?  Confirm HBase is
>> seeing your upped ulimit setting.  Its first thing logged when HBase
>> starts up.  See these two sections:
>>
>> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/notsoquick.html#ulimit<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-3/docs/notsoquick.html#ulimit>
>>
>>
>> > Should I consider some normalization on the Key (hashing, prefix,
>> reverse
>> > the url chars) ? I
>> >
>>
>> If your inserts are pre-sorted, then your load is probably not well
>> distributed over the cluster (you can check your Master UI -- see
>> where the requests are going... I'd guess all are being fielded by one
>> server, then another, and so on rather than all taking even load).
>> Are your keys sorted because they came out of a MR reduce?  If so,
>> could you back up and in your Map insert into HBase?  Then the keys
>> would not be sorted and load better likely better distributed (There
>> is a little on this topic in this section on HBase+MapReduce if you've
>> not seen it already:
>>
>> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-3/docs/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description>
>> ).
>>
>> St.Ack
>>
>>
>> St.Ack
>>
>
>

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
Hi StAck,

 Thanks for your reply.

 ulimit is set to 65535 and xcievers to 4096. I dont see related exceptions
in the log.

 We require the collation from MR, so I cannot put the persist logic in Map.
How ever I did do hashing on the key, so that hadoop sorts in a different
way than hbase, which should distribute the load...

 Even then the key ranges are so narrow, I have noticed most of the requests
ending up on one region server,and eventually that one crashes.

I couldn't prove that there was double allocation for the region.

Latest import I tried with hadoop-0.20-append branch, It sustained little
longer than with previous branch. This time it is again the region server
which is taking the load is crashing. The stack trace is as below.

2011-01-19 17:41:39,405 FATAL
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
required. Forcing server shutdown

org.apache.hadoop.hbase.DroppedSnapshotException: region: webtable,
http://www.ask.com/local?qsrc=&o=0&l=dir&what=recreation+%26+sporting+goods+in+Moraga%2C+CA+in+Oakland+CA+in+Oakland+CA+in+Amsterdam+NH+in+Oakland+CA+in+Oakland+CA+in+Oakland+CA+in+Oakland+CA&where=Oakland+CA,1295487470890

        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1041)

        at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:896)

        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:258)

        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:231)

        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:154)

Caused by: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/webtable/685194390/c/4071023759080531376 File does not exist.
[Lease.  Holder: DFSClient_-354478469, pendingcreates: 2]

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1431)

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1422)

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1477)

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1465)

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)

        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:396)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)



        at org.apache.hadoop.ipc.Client.call(Client.java:740)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)

        at $Proxy1.complete(Unknown Source)

        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

        at $Proxy1.complete(Unknown Source)

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3458)

        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3382)

        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)

        at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)

        at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:620)

        at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:554)

        at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:516)

        at
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:88)

        at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1597)

        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1000)

        ... 4 more

Please advise.

I am not sure, if it matters, but I am testing with hbase-0.90 to see, if it
makes any difference.


Thanks,
Charan

On Sat, Jan 15, 2011 at 5:36 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jan 14, 2011 at 11:36 AM, charan kumar <ch...@gmail.com>
> wrote:
> > Could this be an effect of hotspotting? Since I am persisting millions of
> > keys through MR (which are URLS) and they are already sorted.
> >
>
> I don't think so.  HBase usually buckles first.  For sure you've upped
> ulimit and xceivers as per HBase requirements?  Confirm HBase is
> seeing your upped ulimit setting.  Its first thing logged when HBase
> starts up.  See these two sections:
>
> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/notsoquick.html#ulimit<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-3/docs/notsoquick.html#ulimit>
>
>
> > Should I consider some normalization on the Key (hashing, prefix, reverse
> > the url chars) ? I
> >
>
> If your inserts are pre-sorted, then your load is probably not well
> distributed over the cluster (you can check your Master UI -- see
> where the requests are going... I'd guess all are being fielded by one
> server, then another, and so on rather than all taking even load).
> Are your keys sorted because they came out of a MR reduce?  If so,
> could you back up and in your Map insert into HBase?  Then the keys
> would not be sorted and load better likely better distributed (There
> is a little on this topic in this section on HBase+MapReduce if you've
> not seen it already:
>
> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description<http://people.apache.org/%7Estack/hbase-0.90.0-candidate-3/docs/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description>
> ).
>
> St.Ack
>
>
> St.Ack
>

Re: Region Server shutdown (Replay HLOg required)

Posted by Stack <st...@duboce.net>.
On Fri, Jan 14, 2011 at 11:36 AM, charan kumar <ch...@gmail.com> wrote:
> Could this be an effect of hotspotting? Since I am persisting millions of
> keys through MR (which are URLS) and they are already sorted.
>

I don't think so.  HBase usually buckles first.  For sure you've upped
ulimit and xceivers as per HBase requirements?  Confirm HBase is
seeing your upped ulimit setting.  Its first thing logged when HBase
starts up.  See these two sections:
http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/notsoquick.html#ulimit


> Should I consider some normalization on the Key (hashing, prefix, reverse
> the url chars) ? I
>

If your inserts are pre-sorted, then your load is probably not well
distributed over the cluster (you can check your Master UI -- see
where the requests are going... I'd guess all are being fielded by one
server, then another, and so on rather than all taking even load).
Are your keys sorted because they came out of a MR reduce?  If so,
could you back up and in your Map insert into HBase?  Then the keys
would not be sorted and load better likely better distributed (There
is a little on this topic in this section on HBase+MapReduce if you've
not seen it already:
http://people.apache.org/~stack/hbase-0.90.0-candidate-3/docs/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description).

St.Ack


St.Ack

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
Could this be an effect of hotspotting? Since I am persisting millions of
keys through MR (which are URLS) and they are already sorted.

Should I consider some normalization on the Key (hashing, prefix, reverse
the url chars) ? I

On Thu, Jan 13, 2011 at 11:50 AM, charan kumar <ch...@gmail.com>wrote:

> Hi Stack,
>
> Looks like there is some issue with the block creation.
>
> NameNode received block creation/allocation request.
>
>
> *11/01/12 19:45:43 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519.
> blk_-7100767084720175376_361638*
>
> After some time, It receives a addStoredBlock, where it says the same block
> doesnt belong to a file.
>
> *
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.68:50010 size 23874613 But it does not belong to any file.
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.57:50010 size 23874613 But it does not belong to any file.
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.114:50010 size 23874613 But it does not belong to any file.*
>
> *Eventually, the Error pops out saying Cannot write file. It says
> getFileBlocks is null*
>
>
> *11/01/12 19:45:48 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
> failed to complete
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 because
> dir.getFileBlocks() is null  and pendingFile is null
>
> 11/01/12 19:45:48 INFO ipc.Server: IPC Server handler 54 on 8020, call
> complete(/hbase/webtable/compaction.dir/1326275218/2628290592728420519,
> DFSClient_880037874) from 1XX.XX.XX.XXXX :55925: error: java.io.IOException:
> Could not complete write to file
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
> DFSClient_880037874
> java.io.IOException: Could not complete write to file
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
> DFSClient_880037874
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> *
> Thanks,
> Charan
> *
> *
> On Wed, Jan 12, 2011 at 10:54 PM, charan kumar <ch...@gmail.com>wrote:
>
>> Hi Stak,
>>
>> We user hadoop-0.20.2 . I applied the patch HDFS-630 this morning, didnt
>> help. This is what I see before the line.. The file names might be little
>> different , I am collecting from different failures.
>>
>> 2011-01-12 07:01:32,336 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
>> java.io.IOException: Cannot open filename
>> /hbase/webtable/689554504/qa/8995572027002
>> 725486
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1824)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
>>         at java.io.DataInputStream.read(DataInputStream.java:132)
>>         at
>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
>>         at
>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:88)
>>         at
>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:81)
>>         at
>> org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:120)
>>         at
>> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:66)
>>         at
>> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
>>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>>
>> Also I compiled the following log entries from Name Node , when the region
>> server shutsdown because of this exception. Hope this helps too.
>>
>> 11/01/12 19:45:43 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
>> /hbase/webtable/compaction.dir/1326275218/2628290592728420519.
>> blk_-7100767084720175376_361638
>>
>> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
>> addStoredBlock request received for blk_-7100767084720175376_361638 on
>> 10.76.99.68:50010 size 23874613 But it does not belong to any file.
>> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
>> addStoredBlock request received for blk_-7100767084720175376_361638 on
>> 10.76.99.57:50010 size 23874613 But it does not belong to any file.
>> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
>> addStoredBlock request received for blk_-7100767084720175376_361638 on
>> 10.76.99.114:50010 size 23874613 But it does not belong to any file.
>>
>>
>> 11/01/12 19:45:48 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
>> failed to complete
>> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 because
>> dir.getFileBlocks() is null  and pendingFile is null
>>
>> 11/01/12 19:45:48 INFO ipc.Server: IPC Server handler 54 on 8020, call
>> complete(/hbase/webtable/compaction.dir/1326275218/2628290592728420519,
>> DFSClient_880037874) from 1XX.XX.XX.XXXX :55925: error: java.io.IOException:
>> Could not complete write to file
>> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
>> DFSClient_880037874
>> java.io.IOException: Could not complete write to file
>> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
>> DFSClient_880037874
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)
>>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>>
>>
>>
>>
>> On Wed, Jan 12, 2011 at 10:40 PM, Stack <st...@duboce.net> wrote:
>>
>>> On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com>
>>> wrote:
>>> > 2011-01-11 16:00:27,489 FATAL
>>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
>>> > required. Forcing server shutdown
>>> >
>>>
>>>
>>> What was in the log before this line?  It would say why the replay
>>> required.
>>>
>>> One thing you might do is grep 1138778035, the encoded name of the
>>> region that failed the write below in your master log.  Where was the
>>> region assigned?  Was it assigned two places concurrently?
>>>
>>> St.Ack
>>>
>>
>>
>

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
Hi Stack,

Looks like there is some issue with the block creation.

NameNode received block creation/allocation request.

*11/01/12 19:45:43 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
/hbase/webtable/compaction.dir/1326275218/2628290592728420519.
blk_-7100767084720175376_361638*

After some time, It receives a addStoredBlock, where it says the same block
doesnt belong to a file.
*
11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock request received for blk_-7100767084720175376_361638 on
10.76.99.68:50010 size 23874613 But it does not belong to any file.
11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock request received for blk_-7100767084720175376_361638 on
10.76.99.57:50010 size 23874613 But it does not belong to any file.
11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock request received for blk_-7100767084720175376_361638 on
10.76.99.114:50010 size 23874613 But it does not belong to any file.*

*Eventually, the Error pops out saying Cannot write file. It says
getFileBlocks is null*

*11/01/12 19:45:48 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
failed to complete
/hbase/webtable/compaction.dir/1326275218/2628290592728420519 because
dir.getFileBlocks() is null  and pendingFile is null

11/01/12 19:45:48 INFO ipc.Server: IPC Server handler 54 on 8020, call
complete(/hbase/webtable/compaction.dir/1326275218/2628290592728420519,
DFSClient_880037874) from 1XX.XX.XX.XXXX :55925: error: java.io.IOException:
Could not complete write to file
/hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
DFSClient_880037874
java.io.IOException: Could not complete write to file
/hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
DFSClient_880037874
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
*
Thanks,
Charan*
*
On Wed, Jan 12, 2011 at 10:54 PM, charan kumar <ch...@gmail.com>wrote:

> Hi Stak,
>
> We user hadoop-0.20.2 . I applied the patch HDFS-630 this morning, didnt
> help. This is what I see before the line.. The file names might be little
> different , I am collecting from different failures.
>
> 2011-01-12 07:01:32,336 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Cannot open filename
> /hbase/webtable/689554504/qa/8995572027002
> 725486
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1824)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
>         at java.io.DataInputStream.read(DataInputStream.java:132)
>         at
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
>         at
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:88)
>         at
> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:81)
>         at
> org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:120)
>         at
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:66)
>         at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>
> Also I compiled the following log entries from Name Node , when the region
> server shutsdown because of this exception. Hope this helps too.
>
> 11/01/12 19:45:43 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519.
> blk_-7100767084720175376_361638
>
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.68:50010 size 23874613 But it does not belong to any file.
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.57:50010 size 23874613 But it does not belong to any file.
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.114:50010 size 23874613 But it does not belong to any file.
>
>
> 11/01/12 19:45:48 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
> failed to complete
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 because
> dir.getFileBlocks() is null  and pendingFile is null
>
> 11/01/12 19:45:48 INFO ipc.Server: IPC Server handler 54 on 8020, call
> complete(/hbase/webtable/compaction.dir/1326275218/2628290592728420519,
> DFSClient_880037874) from 1XX.XX.XX.XXXX :55925: error: java.io.IOException:
> Could not complete write to file
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
> DFSClient_880037874
> java.io.IOException: Could not complete write to file
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
> DFSClient_880037874
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
>
>
> On Wed, Jan 12, 2011 at 10:40 PM, Stack <st...@duboce.net> wrote:
>
>> On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com>
>> wrote:
>> > 2011-01-11 16:00:27,489 FATAL
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
>> > required. Forcing server shutdown
>> >
>>
>>
>> What was in the log before this line?  It would say why the replay
>> required.
>>
>> One thing you might do is grep 1138778035, the encoded name of the
>> region that failed the write below in your master log.  Where was the
>> region assigned?  Was it assigned two places concurrently?
>>
>> St.Ack
>>
>
>

Re: Region Server shutdown (Replay HLOg required)

Posted by Stack <st...@duboce.net>.
On Wed, Jan 12, 2011 at 10:54 PM, charan kumar <ch...@gmail.com> wrote:
> Hi Stak,
>
> We user hadoop-0.20.2 . I applied the patch HDFS-630 this morning, didnt
> help.

You ever consider running the apache 0.20-append branch or CDH? Both
have a bunch of fixes to HDFS above 0.20.2 Hadoop.

> This is what I see before the line.. The file names might be little
> different , I am collecting from different failures.
>
> 2011-01-12 07:01:32,336 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
> java.io.IOException: Cannot open filename
> /hbase/webtable/689554504/qa/8995572027002
> 725486
>        at


Does it say WHY it can't open the file?  In 0.20.6 era HBase, this
could have been double assignment (In 0.20.x, there were little holes
where we could end up giving the same region out to two different
regionservers).  Assignment A starts up a compaction and removes old
files replacing them with new.  Assignment B references the files --
perhaps to start its own compaction but they are removed.  Where do
you see the 'Cannot open filename'?  Is that on open/deploy of the
region or later when we go to compact?  To find double assignment grep
the region (encoded) name in the master logs.  You might have to take
encoded name, find actual name (in 0.20.x HBase IIRC), and then grep
the actual region name to find the dbl-assign (Grep w/ encoded name
and you'll find a line which has original and encoded name on same
line, then go w/ found region name).
....

> Also I compiled the following log entries from Name Node , when the region
> server shutsdown because of this exception. Hope this helps too.
>
> 11/01/12 19:45:43 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519.
> blk_-7100767084720175376_361638
>
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.68:50010 size 23874613 But it does not belong to any file.
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.57:50010 size 23874613 But it does not belong to any file.
> 11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-7100767084720175376_361638 on
> 10.76.99.114:50010 size 23874613 But it does not belong to any file.
>
>

If you grep this message in HDFS source, you see that it seems to be
part of 'normal' processing.  The block is just added to the set of
invalid blocks (The message is logged 'INFO' level).


> 11/01/12 19:45:48 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
> failed to complete
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 because
> dir.getFileBlocks() is null  and pendingFile is null
>
> 11/01/12 19:45:48 INFO ipc.Server: IPC Server handler 54 on 8020, call
> complete(/hbase/webtable/compaction.dir/1326275218/2628290592728420519,
> DFSClient_880037874) from 1XX.XX.XX.XXXX :55925: error: java.io.IOException:
> Could not complete write to file
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
> DFSClient_880037874
> java.io.IOException: Could not complete write to file
> /hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
> DFSClient_880037874


Is this the only history on this file in NameNode log, or rather, on
the compaction.dir for this region in NameNode?

St.Ack

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
Hi Stak,

We user hadoop-0.20.2 . I applied the patch HDFS-630 this morning, didnt
help. This is what I see before the line.. The file names might be little
different , I am collecting from different failures.

2011-01-12 07:01:32,336 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
java.io.IOException: Cannot open filename
/hbase/webtable/689554504/qa/8995572027002
725486
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1824)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at
org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105)
        at
org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:88)
        at
org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:81)
        at
org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:120)
        at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:66)
        at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)

Also I compiled the following log entries from Name Node , when the region
server shutsdown because of this exception. Hope this helps too.

11/01/12 19:45:43 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
/hbase/webtable/compaction.dir/1326275218/2628290592728420519.
blk_-7100767084720175376_361638

11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock request received for blk_-7100767084720175376_361638 on
10.76.99.68:50010 size 23874613 But it does not belong to any file.
11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock request received for blk_-7100767084720175376_361638 on
10.76.99.57:50010 size 23874613 But it does not belong to any file.
11/01/12 19:45:48 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
addStoredBlock request received for blk_-7100767084720175376_361638 on
10.76.99.114:50010 size 23874613 But it does not belong to any file.


11/01/12 19:45:48 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
failed to complete
/hbase/webtable/compaction.dir/1326275218/2628290592728420519 because
dir.getFileBlocks() is null  and pendingFile is null

11/01/12 19:45:48 INFO ipc.Server: IPC Server handler 54 on 8020, call
complete(/hbase/webtable/compaction.dir/1326275218/2628290592728420519,
DFSClient_880037874) from 1XX.XX.XX.XXXX :55925: error: java.io.IOException:
Could not complete write to file
/hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
DFSClient_880037874
java.io.IOException: Could not complete write to file
/hbase/webtable/compaction.dir/1326275218/2628290592728420519 by
DFSClient_880037874
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:471)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)




On Wed, Jan 12, 2011 at 10:40 PM, Stack <st...@duboce.net> wrote:

> On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com>
> wrote:
> > 2011-01-11 16:00:27,489 FATAL
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
> > required. Forcing server shutdown
> >
>
>
> What was in the log before this line?  It would say why the replay
> required.
>
> One thing you might do is grep 1138778035, the encoded name of the
> region that failed the write below in your master log.  Where was the
> region assigned?  Was it assigned two places concurrently?
>
> St.Ack
>

Re: Region Server shutdown (Replay HLOg required)

Posted by Stack <st...@duboce.net>.
On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com> wrote:
> 2011-01-11 16:00:27,489 FATAL
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
> required. Forcing server shutdown
>


What was in the log before this line?  It would say why the replay required.

One thing you might do is grep 1138778035, the encoded name of the
region that failed the write below in your master log.  Where was the
region assigned?  Was it assigned two places concurrently?

St.Ack

Re: Region Server shutdown (Replay HLOg required)

Posted by Stack <st...@duboce.net>.
You don't say what version of hadoop you are running.  That could make
a difference.
St.Ack

On Wed, Jan 12, 2011 at 3:16 PM, charan kumar <ch...@gmail.com> wrote:
> The only explanation I found related to the exact exception I am seeing was
> posted in the URL.
>
>
> http://sudhirvn.blogspot.com/2010/07/hadoop-hdfs-error-javaioioexception.html
>
>   "In the process of creating the merging compressed file, it creates a
> temp file with merged content, verifies the file and then compresses it to a
> final file after which it deletes the temp file. Errors were being thrown
> when the temp file was getting deleted. Certain times delete was happening
> (meta data on namenode) before the file was replicated to all nodes and that
> was throwing this error. Currently the HDFS API does not provide a way to
> synchronouos method to wait till file is replicated"
>
>  Does this seem related to HBase?
>
> On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com>wrote:
>
>> Hello,
>>
>>   Region servers are dying printing the following exception, under heavy
>> write load. Let me know, if you need any more details. Your help is greatly
>> appreciated.
>>
>>  Environment:
>>    HBase (0.20.6) setup. (30 nodes/region servers)   LZO Compression
>> enabled in Hbase
>>
>> Region Server log entry:
>>
>> 2011-01-11 16:00:27,489 FATAL
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
>> required. Forcing server shutdown
>>
>> org.apache.hadoop.hbase.DroppedSnapshotException: region:
>> webtable,XXXXXXXX1294790230320
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1041)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:896)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:258)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:231)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:154)
>>
>> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> Could not complete write to file
>> /hbase/webtable/1138778035/c/4254248379246402147 by DFSClient_535678138
>>
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
>>
>>         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>>
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>>
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>>
>>         at java.security.AccessController.doPrivileged(Native Method)
>>
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>>
>>
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>>
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>
>>         at $Proxy1.complete(Unknown Source)
>>
>>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>
>>         at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>
>>         at $Proxy1.complete(Unknown Source)
>>
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
>>
>>         at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
>>
>>         at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>>
>>         at
>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>>
>>         at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:620)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:554)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:516)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:88)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1597)
>>
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1000)
>>
>>         ... 4 more
>>
>>
>> Namenode log exception around the same time:
>>
>>
>>
>> 11/01/12 07:01:36 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
>> failed to complete /hbase/webtable/689554504/c/3522097247441132047 because
>> dir.getFileBlocks() is null  and pendingFile is null
>> 11/01/12 07:01:36 INFO ipc.Server: IPC Server handler 30 on 8020, call
>> complete(/hbase/webtable/689554504/c/3522097247441132047,
>> DFSClient_1190023263) from XXXXXXX:52636: error: java.io.IOException: Could
>> not complete write to file /
>> hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
>> java.io.IOException: Could not complete write to file
>> /hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
>>         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>>
>>
>> Thanks,
>>
>> Charan
>>
>>
>>
>

Re: Region Server shutdown (Replay HLOg required)

Posted by charan kumar <ch...@gmail.com>.
The only explanation I found related to the exact exception I am seeing was
posted in the URL.


http://sudhirvn.blogspot.com/2010/07/hadoop-hdfs-error-javaioioexception.html

   "In the process of creating the merging compressed file, it creates a
temp file with merged content, verifies the file and then compresses it to a
final file after which it deletes the temp file. Errors were being thrown
when the temp file was getting deleted. Certain times delete was happening
(meta data on namenode) before the file was replicated to all nodes and that
was throwing this error. Currently the HDFS API does not provide a way to
synchronouos method to wait till file is replicated"

 Does this seem related to HBase?

On Wed, Jan 12, 2011 at 12:17 PM, charan kumar <ch...@gmail.com>wrote:

> Hello,
>
>   Region servers are dying printing the following exception, under heavy
> write load. Let me know, if you need any more details. Your help is greatly
> appreciated.
>
>  Environment:
>    HBase (0.20.6) setup. (30 nodes/region servers)   LZO Compression
> enabled in Hbase
>
> Region Server log entry:
>
> 2011-01-11 16:00:27,489 FATAL
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
> required. Forcing server shutdown
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> webtable,XXXXXXXX1294790230320
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1041)
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:896)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:258)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:231)
>
>         at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:154)
>
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> Could not complete write to file
> /hbase/webtable/1138778035/c/4254248379246402147 by DFSClient_535678138
>
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
>
>         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>
>         at $Proxy1.complete(Unknown Source)
>
>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
>         at $Proxy1.complete(Unknown Source)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3264)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3188)
>
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>
>         at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>
>         at
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:620)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:554)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:516)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:88)
>
>         at
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1597)
>
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1000)
>
>         ... 4 more
>
>
> Namenode log exception around the same time:
>
>
>
> 11/01/12 07:01:36 WARN hdfs.StateChange: DIR* NameSystem.completeFile:
> failed to complete /hbase/webtable/689554504/c/3522097247441132047 because
> dir.getFileBlocks() is null  and pendingFile is null
> 11/01/12 07:01:36 INFO ipc.Server: IPC Server handler 30 on 8020, call
> complete(/hbase/webtable/689554504/c/3522097247441132047,
> DFSClient_1190023263) from XXXXXXX:52636: error: java.io.IOException: Could
> not complete write to file /
> hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
> java.io.IOException: Could not complete write to file
> /hbase/webtable/689554504/c/3522097247441132047 by DFSClient_1190023263
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
>         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
> Thanks,
>
> Charan
>
>
>