You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by santhoma <sa...@yahoo.com> on 2014/03/26 06:35:25 UTC

any distributed cache mechanism available in spark ?

I have been writing map-reduce on hadoop using PIG , and is now trying to
migrate to SPARK.

My cluster consists of multiple nodes, and the jobs depend on a native
library (.so files).
In hadoop and PIG , I could distribute the files across nodes using 
"-files" or "-archive" option, but I could not find any similar mechanism
for SPARK.

Can some one please explain what are the best ways to distribute dependent
files across nodes? 
I have see an SparkContext.addFile() , but looks like this will copy big
files everytime per job.
Moreover, I am not sure if addFile() can automatically unzip archive files.

thanks in advance.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/any-distributed-cache-mechanism-available-in-spark-tp3236.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.