You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ruben de Vries <ru...@hyves.nl> on 2012/06/01 10:28:06 UTC

Hive scratch dir not cleaning up

Hey Hivers,

I'm almost ready to replace our old hadoop implementation with a implementation using Hive,

Now I've ran into (hopefully) my last problem; my /tmp/hive-hduser dir is getting kinda big!
It doesn't seem to cleanup this tmp files, googling for it I run into some tickets about a cleanup setting, should I enable this with the below setting?
Why doesn't it do that by default? Am I the only one somehow racking up a lot of space with tmp files?




<property>
  <name>hive.start.cleanup.scratchdir</name>
  <value>true</value>
</property>

Re: Hive scratch dir not cleaning up

Posted by Edward Capriolo <ed...@gmail.com>.
Forgot the link.

github.com/edwardcapriolo/filecrush

On 6/1/12, Edward Capriolo <ed...@gmail.com> wrote:
> The filecrush tool has a small utility called Clean that accepts and
> age argument and deletes all the files in a directory older then a
> certain time.
>
> We use clean to clean up the tmp hdfs directories applications leave
> remnants in.
>
> Edward
>
> On 6/1/12, Vinod Singh <vi...@vinodsingh.com> wrote:
>> Yes, that is how I do. Though 1 month is too long, I keep it just 2 days.
>>
>> Thanks,
>> Vinod
>>
>> http://blog.vinodsingh.com/
>>
>> On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries
>> <ru...@hyves.nl>wrote:
>>
>>> So I should write a job which cleans up 1 month old results or something
>>> like that?
>>>
>>> From: Vinod Singh [mailto:vinod@vinodsingh.com]
>>> Sent: Friday, June 01, 2012 10:35 AM
>>> To: user@hive.apache.org
>>> Subject: Re: Hive scratch dir not cleaning up
>>>
>>> Hive deletes job contents from the scratch directory on completion of
>>> the
>>> job. Though failed / killed jobs leave data there, which needs to be
>>> removed manually.
>>>
>>> Thanks,
>>> Vinod
>>>
>>> http://blog.vinodsingh.com/
>>> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ru...@hyves.nl>
>>> wrote:
>>> Hey Hivers,
>>>
>>> I’m almost ready to replace our old hadoop implementation with a
>>> implementation using Hive,
>>>
>>> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir
>>> is
>>> getting kinda big!
>>> It doesn’t seem to cleanup this tmp files, googling for it I run into
>>> some
>>> tickets about a cleanup setting, should I enable this with the below
>>> setting?
>>> Why doesn’t it do that by default? Am I the only one somehow racking up
>>> a
>>> lot of space with tmp files?
>>>
>>>
>>>
>>>
>>> <property>
>>>   <name>hive.start.cleanup.scratchdir</name>
>>>   <value>true</value>
>>> </property>
>>>
>>>
>>
>

Re: Hive scratch dir not cleaning up

Posted by Edward Capriolo <ed...@gmail.com>.
The filecrush tool has a small utility called Clean that accepts and
age argument and deletes all the files in a directory older then a
certain time.

We use clean to clean up the tmp hdfs directories applications leave
remnants in.

Edward

On 6/1/12, Vinod Singh <vi...@vinodsingh.com> wrote:
> Yes, that is how I do. Though 1 month is too long, I keep it just 2 days.
>
> Thanks,
> Vinod
>
> http://blog.vinodsingh.com/
>
> On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries
> <ru...@hyves.nl>wrote:
>
>> So I should write a job which cleans up 1 month old results or something
>> like that?
>>
>> From: Vinod Singh [mailto:vinod@vinodsingh.com]
>> Sent: Friday, June 01, 2012 10:35 AM
>> To: user@hive.apache.org
>> Subject: Re: Hive scratch dir not cleaning up
>>
>> Hive deletes job contents from the scratch directory on completion of the
>> job. Though failed / killed jobs leave data there, which needs to be
>> removed manually.
>>
>> Thanks,
>> Vinod
>>
>> http://blog.vinodsingh.com/
>> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ru...@hyves.nl>
>> wrote:
>> Hey Hivers,
>>
>> I’m almost ready to replace our old hadoop implementation with a
>> implementation using Hive,
>>
>> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir is
>> getting kinda big!
>> It doesn’t seem to cleanup this tmp files, googling for it I run into
>> some
>> tickets about a cleanup setting, should I enable this with the below
>> setting?
>> Why doesn’t it do that by default? Am I the only one somehow racking up a
>> lot of space with tmp files?
>>
>>
>>
>>
>> <property>
>>   <name>hive.start.cleanup.scratchdir</name>
>>   <value>true</value>
>> </property>
>>
>>
>

Re: Hive scratch dir not cleaning up

Posted by Vinod Singh <vi...@vinodsingh.com>.
Yes, that is how I do. Though 1 month is too long, I keep it just 2 days.

Thanks,
Vinod

http://blog.vinodsingh.com/

On Fri, Jun 1, 2012 at 2:15 PM, Ruben de Vries <ru...@hyves.nl>wrote:

> So I should write a job which cleans up 1 month old results or something
> like that?
>
> From: Vinod Singh [mailto:vinod@vinodsingh.com]
> Sent: Friday, June 01, 2012 10:35 AM
> To: user@hive.apache.org
> Subject: Re: Hive scratch dir not cleaning up
>
> Hive deletes job contents from the scratch directory on completion of the
> job. Though failed / killed jobs leave data there, which needs to be
> removed manually.
>
> Thanks,
> Vinod
>
> http://blog.vinodsingh.com/
> On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ru...@hyves.nl>
> wrote:
> Hey Hivers,
>
> I’m almost ready to replace our old hadoop implementation with a
> implementation using Hive,
>
> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir is
> getting kinda big!
> It doesn’t seem to cleanup this tmp files, googling for it I run into some
> tickets about a cleanup setting, should I enable this with the below
> setting?
> Why doesn’t it do that by default? Am I the only one somehow racking up a
> lot of space with tmp files?
>
>
>
>
> <property>
>   <name>hive.start.cleanup.scratchdir</name>
>   <value>true</value>
> </property>
>
>

RE: Hive scratch dir not cleaning up

Posted by Ruben de Vries <ru...@hyves.nl>.
So I should write a job which cleans up 1 month old results or something like that?

From: Vinod Singh [mailto:vinod@vinodsingh.com] 
Sent: Friday, June 01, 2012 10:35 AM
To: user@hive.apache.org
Subject: Re: Hive scratch dir not cleaning up

Hive deletes job contents from the scratch directory on completion of the job. Though failed / killed jobs leave data there, which needs to be removed manually.

Thanks,
Vinod

http://blog.vinodsingh.com/
On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ru...@hyves.nl> wrote:
Hey Hivers,
 
I’m almost ready to replace our old hadoop implementation with a implementation using Hive,
 
Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir is getting kinda big!
It doesn’t seem to cleanup this tmp files, googling for it I run into some tickets about a cleanup setting, should I enable this with the below setting?
Why doesn’t it do that by default? Am I the only one somehow racking up a lot of space with tmp files?
 
 
 
 
<property>
  <name>hive.start.cleanup.scratchdir</name>
  <value>true</value>
</property>


Re: Hive scratch dir not cleaning up

Posted by Vinod Singh <vi...@vinodsingh.com>.
Hive deletes job contents from the scratch directory on completion of the
job. Though failed / killed jobs leave data there, which needs to be
removed manually.

Thanks,
Vinod

http://blog.vinodsingh.com/

On Fri, Jun 1, 2012 at 1:58 PM, Ruben de Vries <ru...@hyves.nl>wrote:

> Hey Hivers,****
>
> ** **
>
> I’m almost ready to replace our old hadoop implementation with a
> implementation using Hive,****
>
> ** **
>
> Now I’ve ran into (hopefully) my last problem; my /tmp/hive-hduser dir is
> getting kinda big!****
>
> It doesn’t seem to cleanup this tmp files, googling for it I run into some
> tickets about a cleanup setting, should I enable this with the below
> setting?****
>
> Why doesn’t it do that by default? Am I the only one somehow racking up a
> lot of space with tmp files?****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> <property>****
>
>   <name>hive.start.cleanup.scratchdir</name>****
>
>   <value>true</value>****
>
> </property>****
>