You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/09 14:01:39 UTC

[jira] [Resolved] (SPARK-1881) Executor caching

     [ https://issues.apache.org/jira/browse/SPARK-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-1881.
------------------------------
    Resolution: Not A Problem

> Executor caching
> ----------------
>
>                 Key: SPARK-1881
>                 URL: https://issues.apache.org/jira/browse/SPARK-1881
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>    Affects Versions: 1.0.0
>         Environment: centos 6.5, mesos 0.18.1
>            Reporter: nigel
>            Priority: Minor
>
> The problem is that the executor is copied for each run. We have a cluster where the disks are of moderate size and each executor is nearly 170MB. This executor is slow to copy and multiple runs take up a significant amount of space.
> The improvement would be to make it smaller.
> Currently the examples are included in there, which are not needed for execution. It is easy to take them out, but it might be better to not include them in the default build.
> Another improvement might be to cache the executor jar. The script below will make a 'sparklite' executor which only downloads the jar file once (until the tmp dir is wiped). The scripts (small) are downloaded each time as before.
> This example would need more work, the source and dest are currently hard-coded and it might be a good idea to check file dates and or checksums in case someone was uploading jars with the same version.
> This might be a bit redundant, depending on what happens with other work on executor caching.
> Comments welcome.
> --------------------------
> mkdir sparklite
> echo '58c58
> <   if [ -f "$FWDIR/RELEASE" ]; then
> ---
> >   if [ -f "$FWDIR/RELEASE" ] && [ -f "$FWDIR"/lib/spark-assembly*hadoop*.jar ]; then
> 60c60
> <   else
> ---
> >   elif [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar ]; then
> 61a62,68
> >   else
> > #Try the local one. If not there, download from hdfs
> >     if [ ! -f /tmp/sparklite/spark-assembly*hadoop*.jar ]; then
> >         mkdir /tmp/sparklite 2>/dev/null
> >         hdfs dfs -get /spark/spark-assembly*-hadoop*.jar /tmp/sparklite/
> >     fi    
> >     ASSEMBLY_JAR=$(ls /tmp/sparklite/spark-assembly*hadoop*.jar 2>/dev/null)
> 64a72
> > ' > cc.patch
> tar -C sparklite -xf spark-1.0.0.tgz 
> cd sparklite
> hdfs dfs -put ./spark-1.0.0/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar /spark/
> rm -f spark-1.0.0/lib/*assembly*
> rm -f spark-1.0.0/lib/*example*
> rm -f spark-1.0.0/bin/*.cmd
> rm -rf spark-1.0.0/ec2
> rm -rf spark-1.0.0/lib
> rm -rf spark-1.0.0/conf
> rm -rf spark-1.0.0/examples
> patch spark-1.0.0/bin/compute-classpath.sh < cc.patch
> rm -f spark-1.0.0.tgz
> tar zcf spark-1.0.0.tgz spark-1.0.0
> hdfs dfs -rm /spark/spark-1.0.0.tgz
> hdfs dfs -put ./spark-1.0.0.tgz /spark/
> ------------------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org