You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tsai Li Ming <ma...@ltsai.com> on 2014/03/13 10:16:45 UTC

Spark temp dir (spark.local.dir)

Hi,

I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).

What's the difference?

I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.

I have also tried setting -Dspark.local.dir when I run the application.

Thanks!


Re: Spark temp dir (spark.local.dir)

Posted by Scott Clasen <sc...@gmail.com>.
are you setting '-Dspark.local.dir=/mytemp/mytempsubdir'  ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-temp-dir-spark-local-dir-tp2643p5508.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark temp dir (spark.local.dir)

Posted by Guillaume Pitel <gu...@exensa.com>.
>>> spark.local.dir can and should be set both on the executors and on the 
>>> driver (if the driver broadcast variables, the files will be stored in this 
>>> directory)
> Do you mean the worker nodes?

No, only the driver broadcasts I think.

> Don’t think they are jetty connectors and the directories are empty:
> /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars
> /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files

Indeed, I must have confused that with something else. Spark local dir contains 
directory starting with spark-local-* , so I don't know what these files are.

> I run the application like this, even with the java.io.tmpdir :
> bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm -Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR 
> spark://oct1:7077 10
>

How do you pass the spark.local.dir to the workers ? in SPARK_JAVA_OPTS during 
SparkContext creation ? It should probably be passed in the spark-env.sh because 
it can differ on each node

Guillaume



>
>
> On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel <guillaume.pitel@exensa.com 
> <ma...@exensa.com>> wrote:
>
>> Also, I think the jetty connector will create a small file or directory in 
>> /tmp regardless of the spark.local.dir
>>
>> It's very small, about 10KB
>>
>> Guillaume
>>> I'm not 100% sure but I think it goes like this :
>>>
>>> spark.local.dir can and should be set both on the executors and on the 
>>> driver (if the driver broadcast variables, the files will be stored in this 
>>> directory)
>>>
>>> the SPARK_WORKER_DIR is where the jars and the log output of the executors 
>>> is placed (default $SPARK_HOME/work/) and it should be cleaned regularly
>>>
>>> In $SPARK_HOME/logs are found the logs of the workers and master
>>>
>>> Guillaume
>>>> Hi,
>>>>
>>>> I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
>>>>
>>>> What's the difference?
>>>>
>>>> I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.
>>>>
>>>> I have also tried setting -Dspark.local.dir when I run the application.
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>> -- 
>>> <Mail Attachment.png>
>>>
>>> 	
>>> *Guillaume PITEL, Président*
>>> +33(0)6 25 48 86 80
>>>
>>> eXenSa S.A.S. <http://www.exensa.com/>
>>> 41, rue Périer - 92120 Montrouge - FRANCE
>>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>>
>>
>>
>> -- 
>> <exensa_logo_mail.png>
>>
>> 	
>> *Guillaume PITEL, Président*
>> +33(0)6 25 48 86 80
>>
>> eXenSa S.A.S. <http://www.exensa.com/>
>> 41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>
>


-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05


Re: Spark temp dir (spark.local.dir)

Posted by Tsai Li Ming <ma...@ltsai.com>.
>> spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory)
Do you mean the worker nodes?

Don’t think they are jetty connectors and the directories are empty:
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files

I run the application like this, even with the java.io.tmpdir :
bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm -Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR spark://oct1:7077 10




On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel <gu...@exensa.com> wrote:

> Also, I think the jetty connector will create a small file or directory in /tmp regardless of the spark.local.dir 
> 
> It's very small, about 10KB
> 
> Guillaume
>> I'm not 100% sure but I think it goes like this : 
>> 
>> spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory)
>> 
>> the SPARK_WORKER_DIR is where the jars and the log output of the executors is placed (default $SPARK_HOME/work/) and it should be cleaned regularly 
>> 
>> In $SPARK_HOME/logs are found the logs of the workers and master
>> 
>> Guillaume
>>> Hi,
>>> 
>>> I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
>>> 
>>> What's the difference?
>>> 
>>> I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.
>>> 
>>> I have also tried setting -Dspark.local.dir when I run the application.
>>> 
>>> Thanks!
>>> 
>> 
>> 
>> -- 
>> <Mail Attachment.png>
>> Guillaume PITEL, Président 
>> +33(0)6 25 48 86 80
>> 
>> eXenSa S.A.S. 
>> 41, rue Périer - 92120 Montrouge - FRANCE 
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
> 
> 
> -- 
> <exensa_logo_mail.png>
> Guillaume PITEL, Président 
> +33(0)6 25 48 86 80
> 
> eXenSa S.A.S. 
> 41, rue Périer - 92120 Montrouge - FRANCE 
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05


Re: Spark temp dir (spark.local.dir)

Posted by Guillaume Pitel <gu...@exensa.com>.
Also, I think the jetty connector will create a small file or directory in /tmp 
regardless of the spark.local.dir

It's very small, about 10KB

Guillaume
> I'm not 100% sure but I think it goes like this :
>
> spark.local.dir can and should be set both on the executors and on the driver 
> (if the driver broadcast variables, the files will be stored in this directory)
>
> the SPARK_WORKER_DIR is where the jars and the log output of the executors is 
> placed (default $SPARK_HOME/work/) and it should be cleaned regularly
>
> In $SPARK_HOME/logs are found the logs of the workers and master
>
> Guillaume
>> Hi,
>>
>> I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
>>
>> What's the difference?
>>
>> I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.
>>
>> I have also tried setting -Dspark.local.dir when I run the application.
>>
>> Thanks!
>>
>
>
> -- 
> eXenSa
>
> 	
> *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
> 41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>


-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05


Re: Spark temp dir (spark.local.dir)

Posted by Guillaume Pitel <gu...@exensa.com>.
I'm not 100% sure but I think it goes like this :

spark.local.dir can and should be set both on the executors and on the driver 
(if the driver broadcast variables, the files will be stored in this directory)

the SPARK_WORKER_DIR is where the jars and the log output of the executors is 
placed (default $SPARK_HOME/work/) and it should be cleaned regularly

In $SPARK_HOME/logs are found the logs of the workers and master

Guillaume
> Hi,
>
> I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
>
> What's the difference?
>
> I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.
>
> I have also tried setting -Dspark.local.dir when I run the application.
>
> Thanks!
>


-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05