You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Paul Schooss <pa...@gmail.com> on 2014/01/31 04:01:55 UTC

Configuring distributed caching with Spark and YARN

Hello Folks,

I was wondering if anyone was able to successfully setup distributed
caching of jar files using CDH 5/YARN/Spark ? I can not seem to get my
cluster working in that fashion.


Regards,

Paul Schooss

Re: Configuring distributed caching with Spark and YARN

Posted by santhoma <sa...@yahoo.com>.

I think with addJar() there is no 'caching',  in the sense files will be
copied everytime per job.
Whereas in hadoop distributed cache, files will be copied only once, and a
symlink will be created to the cache file for subsequent runs:
https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/filecache/DistributedCache.html

Also,hadoop distributed cache can copy an archive  file to the node and
unzip it automatically to current working dir. The advantage here is that
the copying will be very fast..

Still looking for similar  mechanisms in SPARK




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p3566.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Configuring distributed caching with Spark and YARN

Posted by Mayur Rustagi <ma...@gmail.com>.

is this equivalent to addjar?


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Mar 27, 2014 at 3:58 AM, santhoma <sa...@yahoo.com> wrote:

> Curious to know, were you able to do distributed caching for spark?
>
> I have done that for hadoop and pig, but could not find a way to do it in
> spark
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p3325.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Configuring distributed caching with Spark and YARN

Posted by santhoma <sa...@yahoo.com>.

Curious to know, were you able to do distributed caching for spark?

I have done that for hadoop and pig, but could not find a way to do it in
spark



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p3325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.