You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Priya Ch <le...@gmail.com> on 2014/09/23 12:31:15 UTC

spark.local.dir and spark.worker.dir not used

Hi,

I am using spark 1.0.0. In my spark code i m trying to persist an rdd to
disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the
location where the rdd has been written to disk. I specified
SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than
using the default /tmp directory, but still couldnt see anything in worker
directory andspark ocal directory.

I also tried specifying the local dir and worker dir from the spark code
while defining the SparkConf as conf.set("spark.local.dir",
"/home/padma/sparkdir") but the directories are not used.


In general which directories spark would be using for map output files,
intermediate writes and persisting rdd to disk ?


Thanks,
Padma Ch

RE: spark.local.dir and spark.worker.dir not used

Posted by "Shao, Saisai" <sa...@intel.com>.

This folder will be created when you start your Spark application under your spark.local.dir, with the name “spark-local-xxx” as prefix. It’s quite strange you don’t see this folder, maybe you miss something. Besides if Spark cannot create this folder on start, persist rdd to disk will be failed.

Also I think there’s no way to persist RDD to HDFS, even in YARN, only RDD’s checkpoint can save data on HDFS.

Thanks
Jerry

From: Chitturi Padma [mailto:learnings.chitturi@gmail.com]
Sent: Tuesday, September 23, 2014 8:33 PM
To: user@spark.incubator.apache.org
Subject: Re: spark.local.dir and spark.worker.dir not used

I couldnt even see the spark-<id> folder in the default /tmp directory of local.dir.........

On Tue, Sep 23, 2014 at 6:01 PM, Priya Ch <[hidden email]</user/SendEmail.jtp?type=node&node=14887&i=0>> wrote:
Is it possible to view the persisted RDD blocks ?
If I use YARN, RDD blocks would be persisted to hdfs then will i be able to read the hdfs blocks as i could do in hadoop ?

On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] <[hidden email]</user/SendEmail.jtp?type=node&node=14887&i=1>> wrote:
Hi,

Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of  file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node.

Thanks
Jerry

From: Priya Ch [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=14885&i=0>]
Sent: Tuesday, September 23, 2014 6:31 PM
To: [hidden email]<http://user/SendEmail.jtp?type=node&node=14885&i=1>; [hidden email]<http://user/SendEmail.jtp?type=node&node=14885&i=2>
Subject: spark.local.dir and spark.worker.dir not used

Hi,

I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory.

I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set("spark.local.dir", "/home/padma/sparkdir") but the directories are not used.

In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ?

Thanks,
Padma Ch

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html
To start a new topic under Apache Spark User List, email [hidden email]</user/SendEmail.jtp?type=node&node=14887&i=2>
To unsubscribe from Apache Spark User List, click here.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

________________________________
View this message in context: Re: spark.local.dir and spark.worker.dir not used<http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14887.html>
Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.

RE: spark.local.dir and spark.worker.dir not used

Posted by "Liu, Raymond" <ra...@intel.com>.

When did you check the dir’s contents? When the application finished, those dirs will be cleaned.

Best Regards,
Raymond Liu

From: Chitturi Padma [mailto:learnings.chitturi@gmail.com]
Sent: Tuesday, September 23, 2014 8:33 PM
To: user@spark.incubator.apache.org
Subject: Re: spark.local.dir and spark.worker.dir not used

I couldnt even see the spark-<id> folder in the default /tmp directory of local.dir.........

On Tue, Sep 23, 2014 at 6:01 PM, Priya Ch <[hidden email]</user/SendEmail.jtp?type=node&node=14887&i=0>> wrote:
Is it possible to view the persisted RDD blocks ?
If I use YARN, RDD blocks would be persisted to hdfs then will i be able to read the hdfs blocks as i could do in hadoop ?

On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] <[hidden email]</user/SendEmail.jtp?type=node&node=14887&i=1>> wrote:
Hi,

Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of  file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node.

Thanks
Jerry

From: Priya Ch [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=14885&i=0>]
Sent: Tuesday, September 23, 2014 6:31 PM
To: [hidden email]<http://user/SendEmail.jtp?type=node&node=14885&i=1>; [hidden email]<http://user/SendEmail.jtp?type=node&node=14885&i=2>
Subject: spark.local.dir and spark.worker.dir not used

Hi,

I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory.

I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set("spark.local.dir", "/home/padma/sparkdir") but the directories are not used.

In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ?

Thanks,
Padma Ch

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html
To start a new topic under Apache Spark User List, email [hidden email]</user/SendEmail.jtp?type=node&node=14887&i=2>
To unsubscribe from Apache Spark User List, click here.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

________________________________
View this message in context: Re: spark.local.dir and spark.worker.dir not used<http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14887.html>
Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.

Re: spark.local.dir and spark.worker.dir not used

Posted by Chitturi Padma <le...@gmail.com>.

I couldnt even see the spark-<id> folder in the default /tmp directory of
local.dir.........

On Tue, Sep 23, 2014 at 6:01 PM, Priya Ch <le...@gmail.com>
wrote:

> Is it possible to view the persisted RDD blocks ?
>
> If I use YARN, RDD blocks would be persisted to hdfs then will i be able
> to read the hdfs blocks as i could do in hadoop ?
>
> On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List]
> <ml...@n3.nabble.com> wrote:
>
>>  Hi,
>>
>>
>>
>> Spark.local.dir is the one used to write map output data and persistent
>> RDD blocks, but the path of  file has been hashed, so you cannot directly
>> find the persistent rdd block files, but definitely it will be in this
>> folders on your worker node.
>>
>>
>>
>> Thanks
>>
>> Jerry
>>
>>
>>
>> *From:* Priya Ch [mailto:[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=14885&i=0>]
>> *Sent:* Tuesday, September 23, 2014 6:31 PM
>> *To:* [hidden email] <http://user/SendEmail.jtp?type=node&node=14885&i=1>;
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=14885&i=2>
>> *Subject:* spark.local.dir and spark.worker.dir not used
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am using spark 1.0.0. In my spark code i m trying to persist an rdd to
>> disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the
>> location where the rdd has been written to disk. I specified
>> SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than
>> using the default /tmp directory, but still couldnt see anything in worker
>> directory andspark ocal directory.
>>
>>
>>
>> I also tried specifying the local dir and worker dir from the spark code
>> while defining the SparkConf as conf.set("spark.local.dir",
>> "/home/padma/sparkdir") but the directories are not used.
>>
>>
>>
>>
>>
>> In general which directories spark would be using for map output files,
>> intermediate writes and persisting rdd to disk ?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Padma Ch
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html
>>  To start a new topic under Apache Spark User List, email
>> ml-node+s1001560n1h76@n3.nabble.com
>> To unsubscribe from Apache Spark User List, click here
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==>
>> .
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14887.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: spark.local.dir and spark.worker.dir not used

Posted by Tom Hubregtsen <th...@gmail.com>.

Also, if I am not mistaken, this data is automatically removed after your
run. Be sure to check it while running your program.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp8529p8578.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: spark.local.dir and spark.worker.dir not used

Posted by Chitturi Padma <le...@gmail.com>.

Is it possible to view the persisted RDD blocks ?

If I use YARN, RDD blocks would be persisted to hdfs then will i be able to
read the hdfs blocks as i could do in hadoop ?

On Tue, Sep 23, 2014 at 5:56 PM, Shao, Saisai [via Apache Spark User List] <
ml-node+s1001560n14885h80@n3.nabble.com> wrote:

>  Hi,
>
>
>
> Spark.local.dir is the one used to write map output data and persistent
> RDD blocks, but the path of  file has been hashed, so you cannot directly
> find the persistent rdd block files, but definitely it will be in this
> folders on your worker node.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Priya Ch [mailto:[hidden email]
> <http://user/SendEmail.jtp?type=node&node=14885&i=0>]
> *Sent:* Tuesday, September 23, 2014 6:31 PM
> *To:* [hidden email] <http://user/SendEmail.jtp?type=node&node=14885&i=1>;
> [hidden email] <http://user/SendEmail.jtp?type=node&node=14885&i=2>
> *Subject:* spark.local.dir and spark.worker.dir not used
>
>
>
> Hi,
>
>
>
> I am using spark 1.0.0. In my spark code i m trying to persist an rdd to
> disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the
> location where the rdd has been written to disk. I specified
> SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than
> using the default /tmp directory, but still couldnt see anything in worker
> directory andspark ocal directory.
>
>
>
> I also tried specifying the local dir and worker dir from the spark code
> while defining the SparkConf as conf.set("spark.local.dir",
> "/home/padma/sparkdir") but the directories are not used.
>
>
>
>
>
> In general which directories spark would be using for map output files,
> intermediate writes and persisting rdd to disk ?
>
>
>
>
>
> Thanks,
>
> Padma Ch
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14885.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h76@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp14881p14886.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: spark.local.dir and spark.worker.dir not used

Posted by "Shao, Saisai" <sa...@intel.com>.

Hi,

Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of  file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node.

Thanks
Jerry

From: Priya Ch [mailto:learnings.chitturi@gmail.com]
Sent: Tuesday, September 23, 2014 6:31 PM
To: user@spark.apache.org; dev@spark.apache.org
Subject: spark.local.dir and spark.worker.dir not used

Hi,

I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory.

I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set("spark.local.dir", "/home/padma/sparkdir") but the directories are not used.

In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ?

Thanks,
Padma Ch

RE: spark.local.dir and spark.worker.dir not used

Posted by "Shao, Saisai" <sa...@intel.com>.

Hi,

Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of  file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node.

Thanks
Jerry

From: Priya Ch [mailto:learnings.chitturi@gmail.com]
Sent: Tuesday, September 23, 2014 6:31 PM
To: user@spark.apache.org; dev@spark.apache.org
Subject: spark.local.dir and spark.worker.dir not used

Hi,

I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /tmp directory, but still couldnt see anything in worker directory andspark ocal directory.

I also tried specifying the local dir and worker dir from the spark code while defining the SparkConf as conf.set("spark.local.dir", "/home/padma/sparkdir") but the directories are not used.

In general which directories spark would be using for map output files, intermediate writes and persisting rdd to disk ?

Thanks,
Padma Ch