You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by hitarth trivedi <t....@gmail.com> on 2015/01/28 00:46:02 UTC

yarn cache settings

Hi,



We have yarn.nodemanager.local-dirs set to
/var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
jobs store temporary data. On restart of nodemanager, the contents of the
directory are deleted. I see the following definitions for
yarn.nodemanager.localizer.cache.target-size-mb
(default  to 10240MB) and
yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
min)



·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
maximum disk space to be used for localizing resources. (At present there
is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
<https://issues.apache.org/jira/browse/YARN-882>). Once the total disk size
of the cache exceeds this then Deletion service will try to remove files
which are not used by any running containers. At present there is no limit
(quota) for user cache / public cache / private cache. This limit is
applicable to all the disks as a total and is not based on per disk basis.

·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
interval resource localization service will try to delete the unused
resources if total cache size exceeds the configured max-size. Unused
resources are those resources which are not referenced by any running
container. Every time container requests a resource, container is added
into the resources’ reference list. It will remain there until container
finishes avoiding accidental deletion of this resource. As a part of
container resource cleanup (when container finishes) container will be
removed from resources’ reference list. That is why when reference count
drops to zero it is an ideal candidate for deletion. The resources will be
deleted on LRU basis until current cache size drops below target size.



My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
to a lower size of 1GB for testing purposes and let the service delete the
folder when it reaches this limit. I was expecting the service to delete
the contents once it crosses this limit. But I see that the size is growing
beyond this limit, on every run of mapreduce jobs, but the service is not
kicking in to delete the contents. The jobs are succeeded and completed. Do
I need to do something else?



Thanks,

Hitarth

Re: yarn cache settings

Posted by hitarth trivedi <t....@gmail.com>.

Hi,

What can I do instead ? Should I point my local-dir to something else? If
so, what?

Thanks,
Hitarth

On Tue, Jan 27, 2015 at 9:38 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> If you are running /var/lib/hadoop/tmp dir in the / file system, you may
> want to reconsider that. Disk IO will cause issues with the OS as it
> attempts to use "it's" file system.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> We have yarn.nodemanager.local-dirs set to
>> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
>> jobs store temporary data. On restart of nodemanager, the contents of the
>> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
>> (default  to 10240MB) and
>> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
>> min)
>>
>>
>>
>> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
>> maximum disk space to be used for localizing resources. (At present there
>> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
>> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
>> size of the cache exceeds this then Deletion service will try to remove
>> files which are not used by any running containers. At present there is no
>> limit (quota) for user cache / public cache / private cache. This limit is
>> applicable to all the disks as a total and is not based on per disk basis.
>>
>> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
>> interval resource localization service will try to delete the unused
>> resources if total cache size exceeds the configured max-size. Unused
>> resources are those resources which are not referenced by any running
>> container. Every time container requests a resource, container is added
>> into the resources’ reference list. It will remain there until container
>> finishes avoiding accidental deletion of this resource. As a part of
>> container resource cleanup (when container finishes) container will be
>> removed from resources’ reference list. That is why when reference count
>> drops to zero it is an ideal candidate for deletion. The resources will be
>> deleted on LRU basis until current cache size drops below target size.
>>
>>
>>
>> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
>> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
>> to a lower size of 1GB for testing purposes and let the service delete the
>> folder when it reaches this limit. I was expecting the service to delete
>> the contents once it crosses this limit. But I see that the size is growing
>> beyond this limit, on every run of mapreduce jobs, but the service is not
>> kicking in to delete the contents. The jobs are succeeded and completed. Do
>> I need to do something else?
>>
>>
>>
>> Thanks,
>>
>> Hitarth
>>
>
>

Re: yarn cache settings

Posted by hitarth trivedi <t....@gmail.com>.

Hi,

What can I do instead ? Should I point my local-dir to something else? If
so, what?

Thanks,
Hitarth

On Tue, Jan 27, 2015 at 9:38 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> If you are running /var/lib/hadoop/tmp dir in the / file system, you may
> want to reconsider that. Disk IO will cause issues with the OS as it
> attempts to use "it's" file system.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> We have yarn.nodemanager.local-dirs set to
>> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
>> jobs store temporary data. On restart of nodemanager, the contents of the
>> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
>> (default  to 10240MB) and
>> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
>> min)
>>
>>
>>
>> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
>> maximum disk space to be used for localizing resources. (At present there
>> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
>> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
>> size of the cache exceeds this then Deletion service will try to remove
>> files which are not used by any running containers. At present there is no
>> limit (quota) for user cache / public cache / private cache. This limit is
>> applicable to all the disks as a total and is not based on per disk basis.
>>
>> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
>> interval resource localization service will try to delete the unused
>> resources if total cache size exceeds the configured max-size. Unused
>> resources are those resources which are not referenced by any running
>> container. Every time container requests a resource, container is added
>> into the resources’ reference list. It will remain there until container
>> finishes avoiding accidental deletion of this resource. As a part of
>> container resource cleanup (when container finishes) container will be
>> removed from resources’ reference list. That is why when reference count
>> drops to zero it is an ideal candidate for deletion. The resources will be
>> deleted on LRU basis until current cache size drops below target size.
>>
>>
>>
>> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
>> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
>> to a lower size of 1GB for testing purposes and let the service delete the
>> folder when it reaches this limit. I was expecting the service to delete
>> the contents once it crosses this limit. But I see that the size is growing
>> beyond this limit, on every run of mapreduce jobs, but the service is not
>> kicking in to delete the contents. The jobs are succeeded and completed. Do
>> I need to do something else?
>>
>>
>>
>> Thanks,
>>
>> Hitarth
>>
>
>

Re: yarn cache settings

Posted by hitarth trivedi <t....@gmail.com>.

Hi,

What can I do instead ? Should I point my local-dir to something else? If
so, what?

Thanks,
Hitarth

On Tue, Jan 27, 2015 at 9:38 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> If you are running /var/lib/hadoop/tmp dir in the / file system, you may
> want to reconsider that. Disk IO will cause issues with the OS as it
> attempts to use "it's" file system.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> We have yarn.nodemanager.local-dirs set to
>> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
>> jobs store temporary data. On restart of nodemanager, the contents of the
>> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
>> (default  to 10240MB) and
>> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
>> min)
>>
>>
>>
>> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
>> maximum disk space to be used for localizing resources. (At present there
>> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
>> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
>> size of the cache exceeds this then Deletion service will try to remove
>> files which are not used by any running containers. At present there is no
>> limit (quota) for user cache / public cache / private cache. This limit is
>> applicable to all the disks as a total and is not based on per disk basis.
>>
>> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
>> interval resource localization service will try to delete the unused
>> resources if total cache size exceeds the configured max-size. Unused
>> resources are those resources which are not referenced by any running
>> container. Every time container requests a resource, container is added
>> into the resources’ reference list. It will remain there until container
>> finishes avoiding accidental deletion of this resource. As a part of
>> container resource cleanup (when container finishes) container will be
>> removed from resources’ reference list. That is why when reference count
>> drops to zero it is an ideal candidate for deletion. The resources will be
>> deleted on LRU basis until current cache size drops below target size.
>>
>>
>>
>> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
>> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
>> to a lower size of 1GB for testing purposes and let the service delete the
>> folder when it reaches this limit. I was expecting the service to delete
>> the contents once it crosses this limit. But I see that the size is growing
>> beyond this limit, on every run of mapreduce jobs, but the service is not
>> kicking in to delete the contents. The jobs are succeeded and completed. Do
>> I need to do something else?
>>
>>
>>
>> Thanks,
>>
>> Hitarth
>>
>
>

Re: yarn cache settings

Posted by hitarth trivedi <t....@gmail.com>.

Hi,

What can I do instead ? Should I point my local-dir to something else? If
so, what?

Thanks,
Hitarth

On Tue, Jan 27, 2015 at 9:38 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> If you are running /var/lib/hadoop/tmp dir in the / file system, you may
> want to reconsider that. Disk IO will cause issues with the OS as it
> attempts to use "it's" file system.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> We have yarn.nodemanager.local-dirs set to
>> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
>> jobs store temporary data. On restart of nodemanager, the contents of the
>> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
>> (default  to 10240MB) and
>> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
>> min)
>>
>>
>>
>> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
>> maximum disk space to be used for localizing resources. (At present there
>> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
>> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
>> size of the cache exceeds this then Deletion service will try to remove
>> files which are not used by any running containers. At present there is no
>> limit (quota) for user cache / public cache / private cache. This limit is
>> applicable to all the disks as a total and is not based on per disk basis.
>>
>> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
>> interval resource localization service will try to delete the unused
>> resources if total cache size exceeds the configured max-size. Unused
>> resources are those resources which are not referenced by any running
>> container. Every time container requests a resource, container is added
>> into the resources’ reference list. It will remain there until container
>> finishes avoiding accidental deletion of this resource. As a part of
>> container resource cleanup (when container finishes) container will be
>> removed from resources’ reference list. That is why when reference count
>> drops to zero it is an ideal candidate for deletion. The resources will be
>> deleted on LRU basis until current cache size drops below target size.
>>
>>
>>
>> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
>> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
>> to a lower size of 1GB for testing purposes and let the service delete the
>> folder when it reaches this limit. I was expecting the service to delete
>> the contents once it crosses this limit. But I see that the size is growing
>> beyond this limit, on every run of mapreduce jobs, but the service is not
>> kicking in to delete the contents. The jobs are succeeded and completed. Do
>> I need to do something else?
>>
>>
>>
>> Thanks,
>>
>> Hitarth
>>
>
>

Re: yarn cache settings

Posted by daemeon reiydelle <da...@gmail.com>.

If you are running /var/lib/hadoop/tmp dir in the / file system, you may
want to reconsider that. Disk IO will cause issues with the OS as it
attempts to use "it's" file system.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
wrote:

> Hi,
>
>
>
> We have yarn.nodemanager.local-dirs set to
> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
> jobs store temporary data. On restart of nodemanager, the contents of the
> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
> (default  to 10240MB) and
> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
> min)
>
>
>
> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
> maximum disk space to be used for localizing resources. (At present there
> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
> size of the cache exceeds this then Deletion service will try to remove
> files which are not used by any running containers. At present there is no
> limit (quota) for user cache / public cache / private cache. This limit is
> applicable to all the disks as a total and is not based on per disk basis.
>
> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
> interval resource localization service will try to delete the unused
> resources if total cache size exceeds the configured max-size. Unused
> resources are those resources which are not referenced by any running
> container. Every time container requests a resource, container is added
> into the resources’ reference list. It will remain there until container
> finishes avoiding accidental deletion of this resource. As a part of
> container resource cleanup (when container finishes) container will be
> removed from resources’ reference list. That is why when reference count
> drops to zero it is an ideal candidate for deletion. The resources will be
> deleted on LRU basis until current cache size drops below target size.
>
>
>
> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
> to a lower size of 1GB for testing purposes and let the service delete the
> folder when it reaches this limit. I was expecting the service to delete
> the contents once it crosses this limit. But I see that the size is growing
> beyond this limit, on every run of mapreduce jobs, but the service is not
> kicking in to delete the contents. The jobs are succeeded and completed. Do
> I need to do something else?
>
>
>
> Thanks,
>
> Hitarth
>

Re: yarn cache settings

Posted by daemeon reiydelle <da...@gmail.com>.

If you are running /var/lib/hadoop/tmp dir in the / file system, you may
want to reconsider that. Disk IO will cause issues with the OS as it
attempts to use "it's" file system.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
wrote:

> Hi,
>
>
>
> We have yarn.nodemanager.local-dirs set to
> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
> jobs store temporary data. On restart of nodemanager, the contents of the
> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
> (default  to 10240MB) and
> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
> min)
>
>
>
> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
> maximum disk space to be used for localizing resources. (At present there
> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
> size of the cache exceeds this then Deletion service will try to remove
> files which are not used by any running containers. At present there is no
> limit (quota) for user cache / public cache / private cache. This limit is
> applicable to all the disks as a total and is not based on per disk basis.
>
> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
> interval resource localization service will try to delete the unused
> resources if total cache size exceeds the configured max-size. Unused
> resources are those resources which are not referenced by any running
> container. Every time container requests a resource, container is added
> into the resources’ reference list. It will remain there until container
> finishes avoiding accidental deletion of this resource. As a part of
> container resource cleanup (when container finishes) container will be
> removed from resources’ reference list. That is why when reference count
> drops to zero it is an ideal candidate for deletion. The resources will be
> deleted on LRU basis until current cache size drops below target size.
>
>
>
> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
> to a lower size of 1GB for testing purposes and let the service delete the
> folder when it reaches this limit. I was expecting the service to delete
> the contents once it crosses this limit. But I see that the size is growing
> beyond this limit, on every run of mapreduce jobs, but the service is not
> kicking in to delete the contents. The jobs are succeeded and completed. Do
> I need to do something else?
>
>
>
> Thanks,
>
> Hitarth
>

Re: yarn cache settings

Posted by daemeon reiydelle <da...@gmail.com>.

If you are running /var/lib/hadoop/tmp dir in the / file system, you may
want to reconsider that. Disk IO will cause issues with the OS as it
attempts to use "it's" file system.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
wrote:

> Hi,
>
>
>
> We have yarn.nodemanager.local-dirs set to
> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
> jobs store temporary data. On restart of nodemanager, the contents of the
> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
> (default  to 10240MB) and
> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
> min)
>
>
>
> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
> maximum disk space to be used for localizing resources. (At present there
> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
> size of the cache exceeds this then Deletion service will try to remove
> files which are not used by any running containers. At present there is no
> limit (quota) for user cache / public cache / private cache. This limit is
> applicable to all the disks as a total and is not based on per disk basis.
>
> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
> interval resource localization service will try to delete the unused
> resources if total cache size exceeds the configured max-size. Unused
> resources are those resources which are not referenced by any running
> container. Every time container requests a resource, container is added
> into the resources’ reference list. It will remain there until container
> finishes avoiding accidental deletion of this resource. As a part of
> container resource cleanup (when container finishes) container will be
> removed from resources’ reference list. That is why when reference count
> drops to zero it is an ideal candidate for deletion. The resources will be
> deleted on LRU basis until current cache size drops below target size.
>
>
>
> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
> to a lower size of 1GB for testing purposes and let the service delete the
> folder when it reaches this limit. I was expecting the service to delete
> the contents once it crosses this limit. But I see that the size is growing
> beyond this limit, on every run of mapreduce jobs, but the service is not
> kicking in to delete the contents. The jobs are succeeded and completed. Do
> I need to do something else?
>
>
>
> Thanks,
>
> Hitarth
>

Re: yarn cache settings

Posted by daemeon reiydelle <da...@gmail.com>.

If you are running /var/lib/hadoop/tmp dir in the / file system, you may
want to reconsider that. Disk IO will cause issues with the OS as it
attempts to use "it's" file system.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Jan 27, 2015 at 3:46 PM, hitarth trivedi <t....@gmail.com>
wrote:

> Hi,
>
>
>
> We have yarn.nodemanager.local-dirs set to
> /var/lib/hadoop/tmp/nm-local-dir. This is the directory where the mapreduce
> jobs store temporary data. On restart of nodemanager, the contents of the
> directory are deleted. I see the following definitions for  yarn.nodemanager.localizer.cache.target-size-mb
> (default  to 10240MB) and
> yarn.nodemanager.localizer.cache.cleanup.interval-ms (600000ms, which is 10
> min)
>
>
>
> ·  *yarn.nodemanager.localizer.cache.target-size-mb*: This decides the
> maximum disk space to be used for localizing resources. (At present there
> is no individual limit for PRIVATE / APPLICATION / PUBLIC cache. YARN-882
> <https://issues.apache.org/jira/browse/YARN-882>). Once the total disk
> size of the cache exceeds this then Deletion service will try to remove
> files which are not used by any running containers. At present there is no
> limit (quota) for user cache / public cache / private cache. This limit is
> applicable to all the disks as a total and is not based on per disk basis.
>
> ·  *yarn.nodemanager.localizer.cache.cleanup.interval-ms*: After this
> interval resource localization service will try to delete the unused
> resources if total cache size exceeds the configured max-size. Unused
> resources are those resources which are not referenced by any running
> container. Every time container requests a resource, container is added
> into the resources’ reference list. It will remain there until container
> finishes avoiding accidental deletion of this resource. As a part of
> container resource cleanup (when container finishes) container will be
> removed from resources’ reference list. That is why when reference count
> drops to zero it is an ideal candidate for deletion. The resources will be
> deleted on LRU basis until current cache size drops below target size.
>
>
>
> My */var/lib/hadoop/tmp/nm-local-dir *has the allocated size of 5GB. So
> what I wanted to do is set yarn.nodemanager.localizer.cache.target-size-mb
> to a lower size of 1GB for testing purposes and let the service delete the
> folder when it reaches this limit. I was expecting the service to delete
> the contents once it crosses this limit. But I see that the size is growing
> beyond this limit, on every run of mapreduce jobs, but the service is not
> kicking in to delete the contents. The jobs are succeeded and completed. Do
> I need to do something else?
>
>
>
> Thanks,
>
> Hitarth
>