You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Xi...@Dell.com on 2013/04/08 19:39:15 UTC

How to configure mapreduce archive size?

Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia


RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Could you specify steps of how to get the debug level of tasktracker?

What I did is in hbase_home/conf folder, update log4j.properities, make hadoop log to debug level and also added:

log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG

and restart hbase (which include hadoop) and run mapreduce job. I can find hadoop debug level log, but I cannot find the log for TaskTracker. Attached is my log4j.properties and my logs. Could you help?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 18, 2013 8:55 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Well, since the DistributedCache is used by the tasktracker, you need to update the log4j configuration file used by the tasktracker daemon. And you need to get the tasktracker log file - from the machine where you see the distributed cache problem.

On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Wednesday, April 17, 2013 9:11 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/







RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Could you specify steps of how to get the debug level of tasktracker?

What I did is in hbase_home/conf folder, update log4j.properities, make hadoop log to debug level and also added:

log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG

and restart hbase (which include hadoop) and run mapreduce job. I can find hadoop debug level log, but I cannot find the log for TaskTracker. Attached is my log4j.properties and my logs. Could you help?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 18, 2013 8:55 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Well, since the DistributedCache is used by the tasktracker, you need to update the log4j configuration file used by the tasktracker daemon. And you need to get the tasktracker log file - from the machine where you see the distributed cache problem.

On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Wednesday, April 17, 2013 9:11 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/







RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Could you specify steps of how to get the debug level of tasktracker?

What I did is in hbase_home/conf folder, update log4j.properities, make hadoop log to debug level and also added:

log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG

and restart hbase (which include hadoop) and run mapreduce job. I can find hadoop debug level log, but I cannot find the log for TaskTracker. Attached is my log4j.properties and my logs. Could you help?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 18, 2013 8:55 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Well, since the DistributedCache is used by the tasktracker, you need to update the log4j configuration file used by the tasktracker daemon. And you need to get the tasktracker log file - from the machine where you see the distributed cache problem.

On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Wednesday, April 17, 2013 9:11 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/







RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Could you specify steps of how to get the debug level of tasktracker?

What I did is in hbase_home/conf folder, update log4j.properities, make hadoop log to debug level and also added:

log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG

and restart hbase (which include hadoop) and run mapreduce job. I can find hadoop debug level log, but I cannot find the log for TaskTracker. Attached is my log4j.properties and my logs. Could you help?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 18, 2013 8:55 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Well, since the DistributedCache is used by the tasktracker, you need to update the log4j configuration file used by the tasktracker daemon. And you need to get the tasktracker log file - from the machine where you see the distributed cache problem.

On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Wednesday, April 17, 2013 9:11 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/







Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Well, since the DistributedCache is used by the tasktracker, you need to
update the log4j configuration file used by the tasktracker daemon. And you
need to get the tasktracker log file - from the machine where you see the
distributed cache problem.


On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I tried http://machine:50030. It did not work for me.****
>
> ** **
>
> In hbase_home/conf folder, I update the log4j configuration properties and
> got attached log. Do you find what is happening for the map reduce job?***
> *
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Wednesday, April 17, 2013 9:11 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> The check for cache file cleanup is controlled by the
> property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
> 1 minute (which should be sufficient for your requirement).****
>
> ** **
>
> I am not sure why the JobTracker UI is inaccessible. If you know where JT
> is running, try hitting http://machine:50030. If that doesn't work, maybe
> check if ports have been changed in mapred-site.xml for a property similar
> to mapred.job.tracker.http.address. ****
>
> ** **
>
> There is logging in the code of the tasktracker component that can help
> debug the distributed cache behaviour. In order to get those logs you need
> to enable debug logging in the log4j configuration properties and restart
> the daemons. Hopefully that will help you get some hints on what is
> happening.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth and Bejoy KS,****
>
>  ****
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
>  ****
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
>  ****
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Well, since the DistributedCache is used by the tasktracker, you need to
update the log4j configuration file used by the tasktracker daemon. And you
need to get the tasktracker log file - from the machine where you see the
distributed cache problem.


On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I tried http://machine:50030. It did not work for me.****
>
> ** **
>
> In hbase_home/conf folder, I update the log4j configuration properties and
> got attached log. Do you find what is happening for the map reduce job?***
> *
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Wednesday, April 17, 2013 9:11 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> The check for cache file cleanup is controlled by the
> property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
> 1 minute (which should be sufficient for your requirement).****
>
> ** **
>
> I am not sure why the JobTracker UI is inaccessible. If you know where JT
> is running, try hitting http://machine:50030. If that doesn't work, maybe
> check if ports have been changed in mapred-site.xml for a property similar
> to mapred.job.tracker.http.address. ****
>
> ** **
>
> There is logging in the code of the tasktracker component that can help
> debug the distributed cache behaviour. In order to get those logs you need
> to enable debug logging in the log4j configuration properties and restart
> the daemons. Hopefully that will help you get some hints on what is
> happening.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth and Bejoy KS,****
>
>  ****
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
>  ****
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
>  ****
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Well, since the DistributedCache is used by the tasktracker, you need to
update the log4j configuration file used by the tasktracker daemon. And you
need to get the tasktracker log file - from the machine where you see the
distributed cache problem.


On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I tried http://machine:50030. It did not work for me.****
>
> ** **
>
> In hbase_home/conf folder, I update the log4j configuration properties and
> got attached log. Do you find what is happening for the map reduce job?***
> *
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Wednesday, April 17, 2013 9:11 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> The check for cache file cleanup is controlled by the
> property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
> 1 minute (which should be sufficient for your requirement).****
>
> ** **
>
> I am not sure why the JobTracker UI is inaccessible. If you know where JT
> is running, try hitting http://machine:50030. If that doesn't work, maybe
> check if ports have been changed in mapred-site.xml for a property similar
> to mapred.job.tracker.http.address. ****
>
> ** **
>
> There is logging in the code of the tasktracker component that can help
> debug the distributed cache behaviour. In order to get those logs you need
> to enable debug logging in the log4j configuration properties and restart
> the daemons. Hopefully that will help you get some hints on what is
> happening.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth and Bejoy KS,****
>
>  ****
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
>  ****
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
>  ****
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Well, since the DistributedCache is used by the tasktracker, you need to
update the log4j configuration file used by the tasktracker daemon. And you
need to get the tasktracker log file - from the machine where you see the
distributed cache problem.


On Fri, Apr 19, 2013 at 6:27 AM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I tried http://machine:50030. It did not work for me.****
>
> ** **
>
> In hbase_home/conf folder, I update the log4j configuration properties and
> got attached log. Do you find what is happening for the map reduce job?***
> *
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Wednesday, April 17, 2013 9:11 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> The check for cache file cleanup is controlled by the
> property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
> 1 minute (which should be sufficient for your requirement).****
>
> ** **
>
> I am not sure why the JobTracker UI is inaccessible. If you know where JT
> is running, try hitting http://machine:50030. If that doesn't work, maybe
> check if ports have been changed in mapred-site.xml for a property similar
> to mapred.job.tracker.http.address. ****
>
> ** **
>
> There is logging in the code of the tasktracker component that can help
> debug the distributed cache behaviour. In order to get those logs you need
> to enable debug logging in the log4j configuration properties and restart
> the daemons. Hopefully that will help you get some hints on what is
> happening.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth and Bejoy KS,****
>
>  ****
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
>  ****
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
>  ****
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Wednesday, April 17, 2013 9:11 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/






RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Wednesday, April 17, 2013 9:11 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/






RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Wednesday, April 17, 2013 9:11 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/






RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I tried http://machine:50030. It did not work for me.

In hbase_home/conf folder, I update the log4j configuration properties and got attached log. Do you find what is happening for the map reduce job?

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Wednesday, April 17, 2013 9:11 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

The check for cache file cleanup is controlled by the property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to 1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT is running, try hitting http://machine:50030. If that doesn't work, maybe check if ports have been changed in mapred-site.xml for a property similar to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help debug the distributed cache behaviour. In order to get those logs you need to enable debug logging in the log4j configuration properties and restart the daemons. Hopefully that will help you get some hints on what is happening.

Thanks
hemanth

On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com>> wrote:
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Tuesday, April 16, 2013 9:35 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/






Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
The check for cache file cleanup is controlled by the
property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT
is running, try hitting http://machine:50030. If that doesn't work, maybe
check if ports have been changed in mapred-site.xml for a property similar
to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help
debug the distributed cache behaviour. In order to get those logs you need
to enable debug logging in the log4j configuration properties and restart
the daemons. Hopefully that will help you get some hints on what is
happening.

Thanks
hemanth


On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:

> Hi Hemanth and Bejoy KS,****
>
> ** **
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
> ** **
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
> ** **
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
The check for cache file cleanup is controlled by the
property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT
is running, try hitting http://machine:50030. If that doesn't work, maybe
check if ports have been changed in mapred-site.xml for a property similar
to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help
debug the distributed cache behaviour. In order to get those logs you need
to enable debug logging in the log4j configuration properties and restart
the daemons. Hopefully that will help you get some hints on what is
happening.

Thanks
hemanth


On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:

> Hi Hemanth and Bejoy KS,****
>
> ** **
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
> ** **
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
> ** **
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
The check for cache file cleanup is controlled by the
property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT
is running, try hitting http://machine:50030. If that doesn't work, maybe
check if ports have been changed in mapred-site.xml for a property similar
to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help
debug the distributed cache behaviour. In order to get those logs you need
to enable debug logging in the log4j configuration properties and restart
the daemons. Hopefully that will help you get some hints on what is
happening.

Thanks
hemanth


On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:

> Hi Hemanth and Bejoy KS,****
>
> ** **
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
> ** **
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
> ** **
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
The check for cache file cleanup is controlled by the
property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
1 minute (which should be sufficient for your requirement).

I am not sure why the JobTracker UI is inaccessible. If you know where JT
is running, try hitting http://machine:50030. If that doesn't work, maybe
check if ports have been changed in mapred-site.xml for a property similar
to mapred.job.tracker.http.address.

There is logging in the code of the tasktracker component that can help
debug the distributed cache behaviour. In order to get those logs you need
to enable debug logging in the log4j configuration properties and restart
the daemons. Hopefully that will help you get some hints on what is
happening.

Thanks
hemanth


On Wed, Apr 17, 2013 at 11:49 PM, <Xi...@dell.com> wrote:

> Hi Hemanth and Bejoy KS,****
>
> ** **
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
> ** **
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
> ** **
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Tuesday, April 16, 2013 9:35 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Tuesday, April 16, 2013 9:35 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Tuesday, April 16, 2013 9:35 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth and Bejoy KS,

I have tried both mapred-site.xml and core-site.xml. They do not work. I set the value to 50K just for testing purpose, however the folder size already goes to 900M now. As in your email, "After they are done, the property will help cleanup the files due to the limit set. " How frequently the cleanup task will be triggered?

Regarding the job.xml, I cannot use JT web UI to find it. It seems when hadoop is packaged within Hbase, this is disabled. I am only use Hbase jobs. I was suggested by Hbase people to get help from Hadoop mailing list. I will contact them again.

Thanks,

Jane

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Tuesday, April 16, 2013 9:35 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

You can limit the size by setting local.cache.size in the mapred-site.xml (or core-site.xml if that works for you). I mistakenly mentioned mapred-default.xml in my last mail - apologies for that. However, please note that this does not prevent whatever is writing into the distributed cache from creating those files when they are required. After they are done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the Distributed cache. It may be useful if you can ask on the HBase list as well if the APIs you are using are creating the files you mention (assuming you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth

On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 9:09 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
You can limit the size by setting local.cache.size in the mapred-site.xml
(or core-site.xml if that works for you). I mistakenly mentioned
mapred-default.xml in my last mail - apologies for that. However, please
note that this does not prevent whatever is writing into the distributed
cache from creating those files when they are required. After they are
done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the
Distributed cache. It may be useful if you can ask on the HBase list as
well if the APIs you are using are creating the files you mention (assuming
you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth


On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
> ** **
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
> ** **
>
> The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive.
> Could you help?****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Xia****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
You can limit the size by setting local.cache.size in the mapred-site.xml
(or core-site.xml if that works for you). I mistakenly mentioned
mapred-default.xml in my last mail - apologies for that. However, please
note that this does not prevent whatever is writing into the distributed
cache from creating those files when they are required. After they are
done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the
Distributed cache. It may be useful if you can ask on the HBase list as
well if the APIs you are using are creating the files you mention (assuming
you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth


On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
> ** **
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
> ** **
>
> The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive.
> Could you help?****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Xia****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
You can limit the size by setting local.cache.size in the mapred-site.xml
(or core-site.xml if that works for you). I mistakenly mentioned
mapred-default.xml in my last mail - apologies for that. However, please
note that this does not prevent whatever is writing into the distributed
cache from creating those files when they are required. After they are
done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the
Distributed cache. It may be useful if you can ask on the HBase list as
well if the APIs you are using are creating the files you mention (assuming
you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth


On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
> ** **
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
> ** **
>
> The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive.
> Could you help?****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Xia****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
You can limit the size by setting local.cache.size in the mapred-site.xml
(or core-site.xml if that works for you). I mistakenly mentioned
mapred-default.xml in my last mail - apologies for that. However, please
note that this does not prevent whatever is writing into the distributed
cache from creating those files when they are required. After they are
done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the
Distributed cache. It may be useful if you can ask on the HBase list as
well if the APIs you are using are creating the files you mention (assuming
you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth


On Tue, Apr 16, 2013 at 11:15 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
> ** **
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
> ** **
>
> The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive.
> Could you help?****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Xia****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
Also, You need to change the value for 'local.cache.size' in core-site.x.l not in core-default.xml.

If you need to override any property in config files do it in *-site.xml not in *-default.xml.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: bejoy.hadoop@gmail.com
Date: Tue, 16 Apr 2013 18:05:51 
To: <us...@hadoop.apache.org>
Reply-To: bejoy.hadoop@gmail.com
Subject: Re: How to configure mapreduce archive size?

You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
Also, You need to change the value for 'local.cache.size' in core-site.x.l not in core-default.xml.

If you need to override any property in config files do it in *-site.xml not in *-default.xml.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: bejoy.hadoop@gmail.com
Date: Tue, 16 Apr 2013 18:05:51 
To: <us...@hadoop.apache.org>
Reply-To: bejoy.hadoop@gmail.com
Subject: Re: How to configure mapreduce archive size?

You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
Also, You need to change the value for 'local.cache.size' in core-site.x.l not in core-default.xml.

If you need to override any property in config files do it in *-site.xml not in *-default.xml.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: bejoy.hadoop@gmail.com
Date: Tue, 16 Apr 2013 18:05:51 
To: <us...@hadoop.apache.org>
Reply-To: bejoy.hadoop@gmail.com
Subject: Re: How to configure mapreduce archive size?

You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





Re: How to configure mapreduce archive size?

Posted by be...@gmail.com.
Also, You need to change the value for 'local.cache.size' in core-site.x.l not in core-default.xml.

If you need to override any property in config files do it in *-site.xml not in *-default.xml.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: bejoy.hadoop@gmail.com
Date: Tue, 16 Apr 2013 18:05:51 
To: <us...@hadoop.apache.org>
Reply-To: bejoy.hadoop@gmail.com
Subject: Re: How to configure mapreduce archive size?

You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: <Xi...@Dell.com>
Date: Tue, 16 Apr 2013 12:45:26 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/





RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any command line arguments like -libjars neither.

Where can I find job.xml? I am using Hbase MapReduce API and not setting any job.xml.

The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use DistributedCache. I don't think you are explicitly using that. Are you using any command line arguments like -libjars etc when you are launching the MapReduce job ? Alternatively you can check job.xml of the launched MR job to see if it has set properties having prefixes like mapred.cache. If nothing's set there, it would seem like some other process or user is adding jars to DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com>> wrote:
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I don't think you are explicitly using that. Are you
using any command line arguments like -libjars etc when you are launching
the MapReduce job ? Alternatively you can check job.xml of the launched MR
job to see if it has set properties having prefixes like mapred.cache. If
nothing's set there, it would seem like some other process or user is
adding jars to DistributedCache when using the cluster.

Thanks
hemanth




On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive.
> There are some jar and class files inside.****
>
> ** **
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
> ** **
>
> Some code here:****
>
> ** **
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
> ** **
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
> ** **
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
> ** **
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I don't think you are explicitly using that. Are you
using any command line arguments like -libjars etc when you are launching
the MapReduce job ? Alternatively you can check job.xml of the launched MR
job to see if it has set properties having prefixes like mapred.cache. If
nothing's set there, it would seem like some other process or user is
adding jars to DistributedCache when using the cluster.

Thanks
hemanth




On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive.
> There are some jar and class files inside.****
>
> ** **
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
> ** **
>
> Some code here:****
>
> ** **
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
> ** **
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
> ** **
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
> ** **
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
> ** **
>

RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Resent. in case you do not get the attachment.

From: Yang, Xia
Sent: Thursday, April 11, 2013 11:11 AM
To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I don't think you are explicitly using that. Are you
using any command line arguments like -libjars etc when you are launching
the MapReduce job ? Alternatively you can check job.xml of the launched MR
job to see if it has set properties having prefixes like mapred.cache. If
nothing's set there, it would seem like some other process or user is
adding jars to DistributedCache when using the cluster.

Thanks
hemanth




On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive.
> There are some jar and class files inside.****
>
> ** **
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
> ** **
>
> Some code here:****
>
> ** **
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
> ** **
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
> ** **
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
> ** **
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I don't think you are explicitly using that. Are you
using any command line arguments like -libjars etc when you are launching
the MapReduce job ? Alternatively you can check job.xml of the launched MR
job to see if it has set properties having prefixes like mapred.cache. If
nothing's set there, it would seem like some other process or user is
adding jars to DistributedCache when using the cluster.

Thanks
hemanth




On Thu, Apr 11, 2013 at 11:40 PM, <Xi...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive.
> There are some jar and class files inside.****
>
> ** **
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
> ** **
>
> Some code here:****
>
> ** **
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
> ** **
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
> ** **
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
> ** **
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
> ** **
>

RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Resent. in case you do not get the attachment.

From: Yang, Xia
Sent: Thursday, April 11, 2013 11:11 AM
To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Resent. in case you do not get the attachment.

From: Yang, Xia
Sent: Thursday, April 11, 2013 11:11 AM
To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Resent. in case you do not get the attachment.

From: Yang, Xia
Sent: Thursday, April 11, 2013 11:11 AM
To: user@hadoop.apache.org
Subject: RE: How to configure mapreduce archive size?

Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive. There are some jar and class files inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic HBase MapReduce API to delete rows from Hbase table. I do not specify to use Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Thursday, April 11, 2013 12:29 AM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com>> wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com<ma...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com<ma...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you paste the contents of the directory ? Not sure whether that will
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you
use Distributed cache (I guess not) ?

Thanks
Hemanth


On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:

> Hi Arun,****
>
> ** **
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive
> goes to more than 1G. It does not work.****
>
> ** **
>
> Is this the right place to change the value?****
>
> ** **
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
> ** **
>
> Arun****
>
> ** **
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>
>
> ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you paste the contents of the directory ? Not sure whether that will
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you
use Distributed cache (I guess not) ?

Thanks
Hemanth


On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:

> Hi Arun,****
>
> ** **
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive
> goes to more than 1G. It does not work.****
>
> ** **
>
> Is this the right place to change the value?****
>
> ** **
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
> ** **
>
> Arun****
>
> ** **
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>
>
> ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you paste the contents of the directory ? Not sure whether that will
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you
use Distributed cache (I guess not) ?

Thanks
Hemanth


On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:

> Hi Arun,****
>
> ** **
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive
> goes to more than 1G. It does not work.****
>
> ** **
>
> Is this the right place to change the value?****
>
> ** **
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
> ** **
>
> Arun****
>
> ** **
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>
>
> ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you paste the contents of the directory ? Not sure whether that will
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you
use Distributed cache (I guess not) ?

Thanks
Hemanth


On Thu, Apr 11, 2013 at 3:34 AM, <Xi...@dell.com> wrote:

> Hi Arun,****
>
> ** **
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive
> goes to more than 1G. It does not work.****
>
> ** **
>
> Is this the right place to change the value?****
>
> ** **
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
> ** **
>
> Arun****
>
> ** **
>
> On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com>
> wrote:****
>
>
>
> ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Wednesday, April 10, 2013 2:45 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:


Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Wednesday, April 10, 2013 2:45 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:


Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Wednesday, April 10, 2013 2:45 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:


Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After that I start my application. After one evening, my /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Wednesday, April 10, 2013 2:45 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com>> <Xi...@Dell.com>> wrote:


Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


Re: How to configure mapreduce archive size?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com> wrote:

> Hi Hemanth,
>  
> For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.
>  
> I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?
>  
>   <name>local.cache.size</name>
>   <value>500000</value>
>  
> Thanks,
>  
> Xia
>  
> From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com] 
> Sent: Monday, April 08, 2013 9:09 PM
> To: user@hadoop.apache.org
> Subject: Re: How to configure mapreduce archive size?
>  
> Hi,
>  
> This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.
>  
> So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.
>  
> Thanks
> Hemanth
>  
> 
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:
> Hi,
>  
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.
>  
> How to configure this and limit the size? I do not want  to waste my space for archive.
>  
> Thanks,
>  
> Xia
>  
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: How to configure mapreduce archive size?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com> wrote:

> Hi Hemanth,
>  
> For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.
>  
> I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?
>  
>   <name>local.cache.size</name>
>   <value>500000</value>
>  
> Thanks,
>  
> Xia
>  
> From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com] 
> Sent: Monday, April 08, 2013 9:09 PM
> To: user@hadoop.apache.org
> Subject: Re: How to configure mapreduce archive size?
>  
> Hi,
>  
> This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.
>  
> So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.
>  
> Thanks
> Hemanth
>  
> 
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:
> Hi,
>  
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.
>  
> How to configure this and limit the size? I do not want  to waste my space for archive.
>  
> Thanks,
>  
> Xia
>  
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: How to configure mapreduce archive size?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com> wrote:

> Hi Hemanth,
>  
> For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.
>  
> I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?
>  
>   <name>local.cache.size</name>
>   <value>500000</value>
>  
> Thanks,
>  
> Xia
>  
> From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com] 
> Sent: Monday, April 08, 2013 9:09 PM
> To: user@hadoop.apache.org
> Subject: Re: How to configure mapreduce archive size?
>  
> Hi,
>  
> This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.
>  
> So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.
>  
> Thanks
> Hemanth
>  
> 
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:
> Hi,
>  
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.
>  
> How to configure this and limit the size? I do not want  to waste my space for archive.
>  
> Thanks,
>  
> Xia
>  
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: How to configure mapreduce archive size?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Ensure no jobs are running (cache limit is only for non-active cache files), check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <Xi...@Dell.com> <Xi...@Dell.com> wrote:

> Hi Hemanth,
>  
> For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.
>  
> I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?
>  
>   <name>local.cache.size</name>
>   <value>500000</value>
>  
> Thanks,
>  
> Xia
>  
> From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com] 
> Sent: Monday, April 08, 2013 9:09 PM
> To: user@hadoop.apache.org
> Subject: Re: How to configure mapreduce archive size?
>  
> Hi,
>  
> This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.
>  
> So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.
>  
> Thanks
> Hemanth
>  
> 
> On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:
> Hi,
>  
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.
>  
> How to configure this and limit the size? I do not want  to waste my space for archive.
>  
> Thanks,
>  
> Xia
>  
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



RE: How to configure mapreduce archive size?

Posted by Xi...@Dell.com.
Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This is just for my testing purpose. However, the folder /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala [mailto:yhemanth@thoughtworks.com]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. (http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However, the files under this cannot be deleted if they are being used. Also, some frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit of the cache size if you feel that will help. The property needs to be set in mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com>> wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for archive.

Thanks,

Xia



Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key "local.cache.size" which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
the files under this cannot be deleted if they are being used. Also, some
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the
limit of the cache size if you feel that will help. The property needs to
be set in mapred-default.xml.

Thanks
Hemanth


On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:

> Hi,****
>
> ** **
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
> ** **
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Xia****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key "local.cache.size" which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
the files under this cannot be deleted if they are being used. Also, some
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the
limit of the cache size if you feel that will help. The property needs to
be set in mapred-default.xml.

Thanks
Hemanth


On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:

> Hi,****
>
> ** **
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
> ** **
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Xia****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key "local.cache.size" which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
the files under this cannot be deleted if they are being used. Also, some
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the
limit of the cache size if you feel that will help. The property needs to
be set in mapred-default.xml.

Thanks
Hemanth


On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:

> Hi,****
>
> ** **
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
> ** **
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Xia****
>
> ** **
>

Re: How to configure mapreduce archive size?

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key "local.cache.size" which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
the files under this cannot be deleted if they are being used. Also, some
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the
limit of the cache size if you feel that will help. The property needs to
be set in mapred-default.xml.

Thanks
Hemanth


On Mon, Apr 8, 2013 at 11:09 PM, <Xi...@dell.com> wrote:

> Hi,****
>
> ** **
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
> ** **
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Xia****
>
> ** **
>