You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Jayesh Patel <jp...@keywcorp.com> on 2016/06/01 15:54:39 UTC

RE: walog consumes all the disk space on power failure

All 3 nodes have 16GB disk space which was 98% consumed when we looked at
them after few hours after the power failed and was restored.  Normally it's
only 33% or about 5GB.  
Once it got into this state Zookeeper couldn't even start because it
couldn't create some logfiles that it needs to create.  So the disk space
usage was real, not sure if you meant that or not.  Ended up wiping away
hdfs data folder and reformatting it to reclaim the space.

Definitely didn't see complaints about writing to WALs.  Only exception is
the following that showed up because namenode wasn't in the right state due
to constrained resources:

2016-05-23 07:06:17,599 [recovery.HadoopLogCloser] WARN : Error recovering
lease on hdfs://instance-accumul
o:8020/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b
55c2e
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.SafeModeException): Cannot rec
over the lease of
/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b55c2e.
Name node is
 in safe mode.
Resources are low on NN. Please add or free up more resources then turn off
safe mode manually. NOTE:  If y
ou turn off safe mode before adding resources, the NN will immediately
return to safe mode. Use "hdfs dfsad
min -safemode leave" to turn safe mode off.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FS
Namesystem.java:1327
)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesyste
m.java:2828)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNo
deRpcServer.java:667
)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator
PB.recoverLease(Clie
ntNamenodeProtocolServerSideTranslatorPB.java:663)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam
enodeProtocol$2.call
BlockingMethod(ClientNamenodeProtocolProtos.java)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
bufRpcEngine.java:61
6)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

        at org.apache.hadoop.ipc.Client.call(Client.java:1476)
        at org.apache.hadoop.ipc.Client.call(Client.java:1407)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
a:229)
        at com.sun.proxy.$Proxy15.recoverLease(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recover
Lease(ClientNamenode

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Tuesday, May 31, 2016 6:54 PM
To: user@accumulo.apache.org
Subject: Re: walog consumes all the disk space on power failure

Hi Jayesh,

Can you quantify some rough size numbers for us? Are you seeing exceptions
in the Accumulo tserver/master logs?

One thought is that when Accumulo creates new WAL files, it sets the
blocksize to be 1G (as a trick to force HDFS into making some "non-standard"
guarantees for us). As a result, it will appear that there are a number of
very large WAL files (but they're essentially empty).

If your instance is in some situation where Accumulo is repeatedly failing
to write to a WAL, it might think the WAL is bad, abandon it, and try to
create a new one. If this is happening each time, I could see it explain the
situation you described. However, you should see the TabletServers
complaining loudly that they cannot write to the WALs.

Jayesh Patel wrote:
> We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with 
> minute amount of data compared to Accumulo standards.
>
> We have run into a situation multiple times now where all the nodes 
> have a power failure and when they are trying to recover from it 
> simultaneously, walog grows exponentially and fills up all the 
> available disk space. We have confirmed that the walog folder under 
> /accumulo in hdfs is consuming 99% of the disk space.
>
> We have tried freeing enough space to be able to run Accumulo 
> processes in the hopes of it burning through walog without success. 
> Walog just grew to take up the freed space.
>
> Given that we need to better manage the power situation, we're trying 
> to understand what could be causing this and if there's anything we 
> can do to avoid this situation.
>
> We have some heartbeat data being written to a table at a very small 
> constant rate which is not sufficient to cause a such large 
> write-ahead log even if HDFS was pulled from under Accumulo's feet, so 
> to speak during the power failure in case you're wondering.
>
> Thank you,
>
> Jayesh
>

Re: walog consumes all the disk space on power failure

Posted by Josh Elser <jo...@gmail.com>.

It depends on how much data you're writing. I can't answer that for ya.

Generally for hadoop, you want to avoid that 80-90% utilization (HDFS 
will limit you to 90 or 95% capacity usage by default, IIRC).

If you're running things like MapReduce, you'll need more headroom to 
account for temporary output, jars being copied, etc. Accumulo has some 
lag in free'ing disk space (e.g. during compaction, you'll have double 
space usage for the files you're re-writing), as does HDFS in actually 
deleting the blocks for files that were deleted.

Jayesh Patel wrote:
> So what would you consider a safe minimum amount of disk space in this case?
>
> Thank you,
> Jayesh
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Thursday, June 02, 2016 1:08 AM
> To: user@accumulo.apache.org
> Subject: Re: walog consumes all the disk space on power failure
>
> Oh. Why do you only have 16GB of space...
>
> You might be able to tweak some of the configuration properties so that
> Accumulo is more aggressive in removing files, but I think you'd just kick
> the can down the road for another ~30minutes.
>
> Jayesh Patel wrote:
>> All 3 nodes have 16GB disk space which was 98% consumed when we looked
>> at them after few hours after the power failed and was restored.
>> Normally it's only 33% or about 5GB.
>> Once it got into this state Zookeeper couldn't even start because it
>> couldn't create some logfiles that it needs to create.  So the disk
>> space usage was real, not sure if you meant that or not.  Ended up
>> wiping away hdfs data folder and reformatting it to reclaim the space.
>>
>> Definitely didn't see complaints about writing to WALs.  Only
>> exception is the following that showed up because namenode wasn't in
>> the right state due to constrained resources:
>>
>> 2016-05-23 07:06:17,599 [recovery.HadoopLogCloser] WARN : Error
>> recovering lease on hdfs://instance-accumul
>> o:8020/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e
>> 8fbb9b
>> 55c2e
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.na
>> menode
>> .SafeModeException): Cannot rec
>> over the lease of
>>
> /accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b55c2e.
>> Name node is
>>    in safe mode.
>> Resources are low on NN. Please add or free up more resources then
>> turn off safe mode manually. NOTE:  If y ou turn off safe mode before
>> adding resources, the NN will immediately return to safe mode. Use
>> "hdfs dfsad min -safemode leave" to turn safe mode off.
>>           at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeM
>> ode(FS
>> Namesystem.java:1327
>> )
>>           at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNam
>> esyste
>> m.java:2828)
>>           at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(
>> NameNo
>> deRpcServer.java:667
>> )
>>           at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran
>> slator
>> PB.recoverLease(Clie
>> ntNamenodeProtocolServerSideTranslatorPB.java:663)
>>           at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli
>> entNam
>> enodeProtocol$2.call
>> BlockingMethod(ClientNamenodeProtocolProtos.java)
>>           at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call
>> (Proto
>> bufRpcEngine.java:61
>> 6)
>>           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>           at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>           at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>           at java.security.AccessController.doPrivileged(Native Method)
>>           at javax.security.auth.Subject.doAs(Unknown Source)
>>           at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
>> ion.ja
>> va:1657)
>>           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>
>>           at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>>           at org.apache.hadoop.ipc.Client.call(Client.java:1407)
>>           at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngi
>> ne.jav
>> a:229)
>>           at com.sun.proxy.$Proxy15.recoverLease(Unknown Source)
>>           at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.r
>> ecover
>> Lease(ClientNamenode
>>
>> -----Original Message-----
>> From: Josh Elser [mailto:josh.elser@gmail.com]
>> Sent: Tuesday, May 31, 2016 6:54 PM
>> To: user@accumulo.apache.org
>> Subject: Re: walog consumes all the disk space on power failure
>>
>> Hi Jayesh,
>>
>> Can you quantify some rough size numbers for us? Are you seeing
>> exceptions in the Accumulo tserver/master logs?
>>
>> One thought is that when Accumulo creates new WAL files, it sets the
>> blocksize to be 1G (as a trick to force HDFS into making some
> "non-standard"
>> guarantees for us). As a result, it will appear that there are a
>> number of very large WAL files (but they're essentially empty).
>>
>> If your instance is in some situation where Accumulo is repeatedly
>> failing to write to a WAL, it might think the WAL is bad, abandon it,
>> and try to create a new one. If this is happening each time, I could
>> see it explain the situation you described. However, you should see
>> the TabletServers complaining loudly that they cannot write to the WALs.
>>
>> Jayesh Patel wrote:
>>> We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with
>>> minute amount of data compared to Accumulo standards.
>>>
>>> We have run into a situation multiple times now where all the nodes
>>> have a power failure and when they are trying to recover from it
>>> simultaneously, walog grows exponentially and fills up all the
>>> available disk space. We have confirmed that the walog folder under
>>> /accumulo in hdfs is consuming 99% of the disk space.
>>>
>>> We have tried freeing enough space to be able to run Accumulo
>>> processes in the hopes of it burning through walog without success.
>>> Walog just grew to take up the freed space.
>>>
>>> Given that we need to better manage the power situation, we're trying
>>> to understand what could be causing this and if there's anything we
>>> can do to avoid this situation.
>>>
>>> We have some heartbeat data being written to a table at a very small
>>> constant rate which is not sufficient to cause a such large
>>> write-ahead log even if HDFS was pulled from under Accumulo's feet,
>>> so to speak during the power failure in case you're wondering.
>>>
>>> Thank you,
>>>
>>> Jayesh
>>>

RE: walog consumes all the disk space on power failure

Posted by Jayesh Patel <jp...@keywcorp.com>.

So what would you consider a safe minimum amount of disk space in this case?

Thank you,
Jayesh

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Thursday, June 02, 2016 1:08 AM
To: user@accumulo.apache.org
Subject: Re: walog consumes all the disk space on power failure

Oh. Why do you only have 16GB of space...

You might be able to tweak some of the configuration properties so that
Accumulo is more aggressive in removing files, but I think you'd just kick
the can down the road for another ~30minutes.

Jayesh Patel wrote:
> All 3 nodes have 16GB disk space which was 98% consumed when we looked 
> at them after few hours after the power failed and was restored.  
> Normally it's only 33% or about 5GB.
> Once it got into this state Zookeeper couldn't even start because it 
> couldn't create some logfiles that it needs to create.  So the disk 
> space usage was real, not sure if you meant that or not.  Ended up 
> wiping away hdfs data folder and reformatting it to reclaim the space.
>
> Definitely didn't see complaints about writing to WALs.  Only 
> exception is the following that showed up because namenode wasn't in 
> the right state due to constrained resources:
>
> 2016-05-23 07:06:17,599 [recovery.HadoopLogCloser] WARN : Error 
> recovering lease on hdfs://instance-accumul 
> o:8020/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e
> 8fbb9b
> 55c2e
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.na
> menode
> .SafeModeException): Cannot rec
> over the lease of
>
/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b55c2e.
> Name node is
>   in safe mode.
> Resources are low on NN. Please add or free up more resources then 
> turn off safe mode manually. NOTE:  If y ou turn off safe mode before 
> adding resources, the NN will immediately return to safe mode. Use 
> "hdfs dfsad min -safemode leave" to turn safe mode off.
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeM
> ode(FS
> Namesystem.java:1327
> )
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNam
> esyste
> m.java:2828)
>          at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(
> NameNo
> deRpcServer.java:667
> )
>          at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran
> slator
> PB.recoverLease(Clie
> ntNamenodeProtocolServerSideTranslatorPB.java:663)
>          at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli
> entNam
> enodeProtocol$2.call
> BlockingMethod(ClientNamenodeProtocolProtos.java)
>          at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call
> (Proto
> bufRpcEngine.java:61
> 6)
>          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Unknown Source)
>          at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> ion.ja
> va:1657)
>          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>          at org.apache.hadoop.ipc.Client.call(Client.java:1407)
>          at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngi
> ne.jav
> a:229)
>          at com.sun.proxy.$Proxy15.recoverLease(Unknown Source)
>          at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.r
> ecover
> Lease(ClientNamenode
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Tuesday, May 31, 2016 6:54 PM
> To: user@accumulo.apache.org
> Subject: Re: walog consumes all the disk space on power failure
>
> Hi Jayesh,
>
> Can you quantify some rough size numbers for us? Are you seeing 
> exceptions in the Accumulo tserver/master logs?
>
> One thought is that when Accumulo creates new WAL files, it sets the 
> blocksize to be 1G (as a trick to force HDFS into making some
"non-standard"
> guarantees for us). As a result, it will appear that there are a 
> number of very large WAL files (but they're essentially empty).
>
> If your instance is in some situation where Accumulo is repeatedly 
> failing to write to a WAL, it might think the WAL is bad, abandon it, 
> and try to create a new one. If this is happening each time, I could 
> see it explain the situation you described. However, you should see 
> the TabletServers complaining loudly that they cannot write to the WALs.
>
> Jayesh Patel wrote:
>> We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with 
>> minute amount of data compared to Accumulo standards.
>>
>> We have run into a situation multiple times now where all the nodes 
>> have a power failure and when they are trying to recover from it 
>> simultaneously, walog grows exponentially and fills up all the 
>> available disk space. We have confirmed that the walog folder under 
>> /accumulo in hdfs is consuming 99% of the disk space.
>>
>> We have tried freeing enough space to be able to run Accumulo 
>> processes in the hopes of it burning through walog without success.
>> Walog just grew to take up the freed space.
>>
>> Given that we need to better manage the power situation, we're trying 
>> to understand what could be causing this and if there's anything we 
>> can do to avoid this situation.
>>
>> We have some heartbeat data being written to a table at a very small 
>> constant rate which is not sufficient to cause a such large 
>> write-ahead log even if HDFS was pulled from under Accumulo's feet, 
>> so to speak during the power failure in case you're wondering.
>>
>> Thank you,
>>
>> Jayesh
>>

Re: walog consumes all the disk space on power failure

Posted by Josh Elser <jo...@gmail.com>.

Oh. Why do you only have 16GB of space...

You might be able to tweak some of the configuration properties so that 
Accumulo is more aggressive in removing files, but I think you'd just 
kick the can down the road for another ~30minutes.

Jayesh Patel wrote:
> All 3 nodes have 16GB disk space which was 98% consumed when we looked at
> them after few hours after the power failed and was restored.  Normally it's
> only 33% or about 5GB.
> Once it got into this state Zookeeper couldn't even start because it
> couldn't create some logfiles that it needs to create.  So the disk space
> usage was real, not sure if you meant that or not.  Ended up wiping away
> hdfs data folder and reformatting it to reclaim the space.
>
> Definitely didn't see complaints about writing to WALs.  Only exception is
> the following that showed up because namenode wasn't in the right state due
> to constrained resources:
>
> 2016-05-23 07:06:17,599 [recovery.HadoopLogCloser] WARN : Error recovering
> lease on hdfs://instance-accumul
> o:8020/accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b
> 55c2e
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
> .SafeModeException): Cannot rec
> over the lease of
> /accumulo/wal/instance-accumulo-3+9997/530f663b-2d6b-42a5-92d6-e8fbb9b55c2e.
> Name node is
>   in safe mode.
> Resources are low on NN. Please add or free up more resources then turn off
> safe mode manually. NOTE:  If y
> ou turn off safe mode before adding resources, the NN will immediately
> return to safe mode. Use "hdfs dfsad
> min -safemode leave" to turn safe mode off.
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FS
> Namesystem.java:1327
> )
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesyste
> m.java:2828)
>          at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNo
> deRpcServer.java:667
> )
>          at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator
> PB.recoverLease(Clie
> ntNamenodeProtocolServerSideTranslatorPB.java:663)
>          at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam
> enodeProtocol$2.call
> BlockingMethod(ClientNamenodeProtocolProtos.java)
>          at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
> bufRpcEngine.java:61
> 6)
>          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Unknown Source)
>          at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
> va:1657)
>          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>          at org.apache.hadoop.ipc.Client.call(Client.java:1407)
>          at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
> a:229)
>          at com.sun.proxy.$Proxy15.recoverLease(Unknown Source)
>          at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recover
> Lease(ClientNamenode
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Tuesday, May 31, 2016 6:54 PM
> To: user@accumulo.apache.org
> Subject: Re: walog consumes all the disk space on power failure
>
> Hi Jayesh,
>
> Can you quantify some rough size numbers for us? Are you seeing exceptions
> in the Accumulo tserver/master logs?
>
> One thought is that when Accumulo creates new WAL files, it sets the
> blocksize to be 1G (as a trick to force HDFS into making some "non-standard"
> guarantees for us). As a result, it will appear that there are a number of
> very large WAL files (but they're essentially empty).
>
> If your instance is in some situation where Accumulo is repeatedly failing
> to write to a WAL, it might think the WAL is bad, abandon it, and try to
> create a new one. If this is happening each time, I could see it explain the
> situation you described. However, you should see the TabletServers
> complaining loudly that they cannot write to the WALs.
>
> Jayesh Patel wrote:
>> We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with
>> minute amount of data compared to Accumulo standards.
>>
>> We have run into a situation multiple times now where all the nodes
>> have a power failure and when they are trying to recover from it
>> simultaneously, walog grows exponentially and fills up all the
>> available disk space. We have confirmed that the walog folder under
>> /accumulo in hdfs is consuming 99% of the disk space.
>>
>> We have tried freeing enough space to be able to run Accumulo
>> processes in the hopes of it burning through walog without success.
>> Walog just grew to take up the freed space.
>>
>> Given that we need to better manage the power situation, we're trying
>> to understand what could be causing this and if there's anything we
>> can do to avoid this situation.
>>
>> We have some heartbeat data being written to a table at a very small
>> constant rate which is not sufficient to cause a such large
>> write-ahead log even if HDFS was pulled from under Accumulo's feet, so
>> to speak during the power failure in case you're wondering.
>>
>> Thank you,
>>
>> Jayesh
>>