You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/06/06 22:10:54 UTC

efficiency of LocalResources and archives

Suppose that I have a large archive in HDFS, say, containing 500 files and 4GB.  I want to make this available via YARN LocalResource.  The archive doesn't change very often (maybe once per month).  Will YARN optimize for this?  Does the expanded per-node cache persist across application runs (using something like modification time to know if re-expansion is needed)?

If the archive is re-expanded on each node every time the app is launched, should I set the replication factor higher to reduce rack bandwidth?

Thanks
John


Re: efficiency of LocalResources and archives

Posted by Ted Xu <tx...@gopivotal.com>.
Hi John,

If the resources are located in HDFS, and you specify the resource by HDFS
URI, then the answer is yes. The node managers will cache resources, and it
will automatically update the resources by modification time (of HDFS file).

It is recommended to increase the resources' replica number, if the
resources been uploaded from client machine, the replica number is
automatically set to 10 by mapreduce framework.


On Fri, Jun 7, 2013 at 4:10 AM, John Lilley <jo...@redpoint.net>wrote:

>  Suppose that I have a large archive in HDFS, say, containing 500 files
> and 4GB.  I want to make this available via YARN LocalResource.  The
> archive doesn’t change very often (maybe once per month).  Will YARN
> optimize for this?  Does the expanded per-node cache persist across
> application runs (using something like modification time to know if
> re-expansion is needed)?****
>
> ** **
>
> If the archive is re-expanded on each node every time the app is launched,
> should I set the replication factor higher to reduce rack bandwidth?****
>
> ** **
>
> Thanks****
>
> John****
>
> ** **
>



-- 
Regards,
Ted Xu

Re: efficiency of LocalResources and archives

Posted by Ted Xu <tx...@gopivotal.com>.
Hi John,

If the resources are located in HDFS, and you specify the resource by HDFS
URI, then the answer is yes. The node managers will cache resources, and it
will automatically update the resources by modification time (of HDFS file).

It is recommended to increase the resources' replica number, if the
resources been uploaded from client machine, the replica number is
automatically set to 10 by mapreduce framework.


On Fri, Jun 7, 2013 at 4:10 AM, John Lilley <jo...@redpoint.net>wrote:

>  Suppose that I have a large archive in HDFS, say, containing 500 files
> and 4GB.  I want to make this available via YARN LocalResource.  The
> archive doesn’t change very often (maybe once per month).  Will YARN
> optimize for this?  Does the expanded per-node cache persist across
> application runs (using something like modification time to know if
> re-expansion is needed)?****
>
> ** **
>
> If the archive is re-expanded on each node every time the app is launched,
> should I set the replication factor higher to reduce rack bandwidth?****
>
> ** **
>
> Thanks****
>
> John****
>
> ** **
>



-- 
Regards,
Ted Xu

Re: efficiency of LocalResources and archives

Posted by Ted Xu <tx...@gopivotal.com>.
Hi John,

If the resources are located in HDFS, and you specify the resource by HDFS
URI, then the answer is yes. The node managers will cache resources, and it
will automatically update the resources by modification time (of HDFS file).

It is recommended to increase the resources' replica number, if the
resources been uploaded from client machine, the replica number is
automatically set to 10 by mapreduce framework.


On Fri, Jun 7, 2013 at 4:10 AM, John Lilley <jo...@redpoint.net>wrote:

>  Suppose that I have a large archive in HDFS, say, containing 500 files
> and 4GB.  I want to make this available via YARN LocalResource.  The
> archive doesn’t change very often (maybe once per month).  Will YARN
> optimize for this?  Does the expanded per-node cache persist across
> application runs (using something like modification time to know if
> re-expansion is needed)?****
>
> ** **
>
> If the archive is re-expanded on each node every time the app is launched,
> should I set the replication factor higher to reduce rack bandwidth?****
>
> ** **
>
> Thanks****
>
> John****
>
> ** **
>



-- 
Regards,
Ted Xu

Re: efficiency of LocalResources and archives

Posted by Ted Xu <tx...@gopivotal.com>.
Hi John,

If the resources are located in HDFS, and you specify the resource by HDFS
URI, then the answer is yes. The node managers will cache resources, and it
will automatically update the resources by modification time (of HDFS file).

It is recommended to increase the resources' replica number, if the
resources been uploaded from client machine, the replica number is
automatically set to 10 by mapreduce framework.


On Fri, Jun 7, 2013 at 4:10 AM, John Lilley <jo...@redpoint.net>wrote:

>  Suppose that I have a large archive in HDFS, say, containing 500 files
> and 4GB.  I want to make this available via YARN LocalResource.  The
> archive doesn’t change very often (maybe once per month).  Will YARN
> optimize for this?  Does the expanded per-node cache persist across
> application runs (using something like modification time to know if
> re-expansion is needed)?****
>
> ** **
>
> If the archive is re-expanded on each node every time the app is launched,
> should I set the replication factor higher to reduce rack bandwidth?****
>
> ** **
>
> Thanks****
>
> John****
>
> ** **
>



-- 
Regards,
Ted Xu