You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Marko Dinic <ma...@nissatech.com> on 2015/05/27 15:02:54 UTC

Files in distributed cache

Hello,

I'm new to Hadoop and a bit used by one thing about distributed cache - 
when do files added to distributed cache get deleted?

I'm concretely interested in Hadoop 0.20.2.

I read the following from Hadoop the definitive guide "Files are deleted to
make room for a new file when the cache exceeds a certain size---10 GB 
by default", but I'm confused - do files in distributed cache get 
deleted only when the threshold of 10GB is exceeded, or they are deleted 
upon job termination?

The thing is, I'm working on a multi tenant cluster and currently only 
5GB is provided for me on the HDFS.

I'm getting the following exception

15/05/27 11:16:57 WARN hdfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: 
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace 
quota of /user/fitman.whirlpool is exceeded: quota=5368709120 diskspace 
consumed=5.2g
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3778)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3640)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2400(DFSClient.java:2846)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3041)
Caused by: org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace 
quota of /user/fitman.whirlpool is exceeded: quota=5368709120 diskspace 
consumed=5.2g
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:149)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1085)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:903)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:288)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1752)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1597)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:771)
at sun.reflect.GeneratedMethodAccessor2199.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1439)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1435)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1433)

at org.apache.hadoop.ipc.Client.call(Client.java:1150)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3773)
... 3 more

Since there's not much data on HDFS (which I can see using HDFS or the 
client that was assigned to me), I'm guessing that files in distributed 
cache accumulate overflowing the 5GBs, so I'm getting the exception.

Can someone please explain what happens with this files? And this is not 
the problem, does anyone have an idea why am I getting this exception.

Best regards,
Marko