You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Mike Hugo <mi...@piragua.com> on 2013/03/07 18:37:12 UTC

Cleanup of distcache after running map reduce jobs

We noticed that after running several thousand map reduce jobs that our
file system was filling up.  The culprit is the libjars that are getting
uploaded to the distributed cache for each job - doesn't look like they're
ever being deleted.

Is there a mechanism to clear the distributed cache (or should this happen
automatically).

This is probably a straight up hadoop question, but I'm asking here first
in case you've seen this sort of thing with accumulo before.

Thanks!

Mike

Re: Cleanup of distcache after running map reduce jobs

Posted by Mike Hugo <mi...@piragua.com>.
Thanks John!

I ended up playing with some settings in mapred-site.xml,
namely mapreduce.tasktracker.local.cache.numberdirectories
and local.cache.size and that seems to have resolved our issue for the
moment.


Mike


On Thu, Mar 7, 2013 at 12:26 PM, John Vines <vi...@apache.org> wrote:

> The cache will clear itself after 24 hours if I remember correctly. I have
> hit this issue before and, provided your hitting the same issue I've seen
> before, you're options are to either-
> 1. up the number of inodes for your system
> 2. add accumulo to the child opts classpath via mapred-site.xml and then
> use the normal hadoop command to kick off your job instead of the
> accumulo/tool.sh script
>
>
> On Thu, Mar 7, 2013 at 12:37 PM, Mike Hugo <mi...@piragua.com> wrote:
>
>> We noticed that after running several thousand map reduce jobs that our
>> file system was filling up.  The culprit is the libjars that are getting
>> uploaded to the distributed cache for each job - doesn't look like they're
>> ever being deleted.
>>
>> Is there a mechanism to clear the distributed cache (or should this
>> happen automatically).
>>
>> This is probably a straight up hadoop question, but I'm asking here first
>> in case you've seen this sort of thing with accumulo before.
>>
>> Thanks!
>>
>> Mike
>>
>
>

Re: Cleanup of distcache after running map reduce jobs

Posted by John Vines <vi...@apache.org>.
The cache will clear itself after 24 hours if I remember correctly. I have
hit this issue before and, provided your hitting the same issue I've seen
before, you're options are to either-
1. up the number of inodes for your system
2. add accumulo to the child opts classpath via mapred-site.xml and then
use the normal hadoop command to kick off your job instead of the
accumulo/tool.sh script


On Thu, Mar 7, 2013 at 12:37 PM, Mike Hugo <mi...@piragua.com> wrote:

> We noticed that after running several thousand map reduce jobs that our
> file system was filling up.  The culprit is the libjars that are getting
> uploaded to the distributed cache for each job - doesn't look like they're
> ever being deleted.
>
> Is there a mechanism to clear the distributed cache (or should this happen
> automatically).
>
> This is probably a straight up hadoop question, but I'm asking here first
> in case you've seen this sort of thing with accumulo before.
>
> Thanks!
>
> Mike
>