You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Michael Shtelma <ms...@gmail.com> on 2018/03/19 16:59:44 UTC

Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Hi everybody,

I am running spark job on yarn, and my problem is that the blockmgr-*
folders are being created under
/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
The size of this folder can grow to a significant size and does not
really fit into /tmp file system for one job, which makes a real
problem for my installation.
I have redefined hadoop.tmp.dir in core-site.xml and
yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
location and expected that the block manager will create the files
there and not under /tmp, but this is not the case. The files are
created under /tmp.

I am wondering if there is a way to make spark not use /tmp at all and
configure it to create all the files somewhere else ?

Any assistance would be greatly appreciated!

Best,
Michael

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Gourav Sengupta <go...@gmail.com>.

Hi Michael,

I think that is what I am trying to show here as the documentation mentions
"NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS
(Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the
cluster manager."

So, in a way I am supporting your statement :)

Regards,
Gourav

On Wed, Mar 28, 2018 at 10:00 AM, Michael Shtelma <ms...@gmail.com>
wrote:

> Hi,
>
> this property will be used in YARN mode only by the driver.
> Executors will use the properties coming from YARN for storing temporary
> files.
>
>
> Best,
> Michael
>
> On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> Hi,
>>
>>
>> As per documentation in: https://spark.apache.org/d
>> ocs/latest/configuration.html
>>
>>
>> spark.local.dir /tmp Directory to use for "scratch" space in Spark,
>> including map output files and RDDs that get stored on disk. This should be
>> on a fast, local disk in your system. It can also be a comma-separated list
>> of multiple directories on different disks. NOTE: In Spark 1.0 and later
>> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
>> LOCAL_DIRS (YARN) environment variables set by the cluster manager.
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>>
>>
>>
>> On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <ms...@gmail.com>
>> wrote:
>>
>>> Hi Keith,
>>>
>>> Thanks  for the suggestion!
>>> I have solved this already.
>>> The problem was, that the yarn process was not responding to
>>> start/stop commands and has not applied my configuration changes.
>>> I have killed it and restarted my cluster, and after that yarn has
>>> started using yarn.nodemanager.local-dirs parameter defined in
>>> yarn-site.xml.
>>> After this change, -Djava.io.tmpdir for the spark executor was set
>>> correctly,  according to yarn.nodemanager.local-dirs parameter.
>>>
>>> Best,
>>> Michael
>>>
>>>
>>> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <ke...@gmail.com>
>>> wrote:
>>> > Hi Michael,
>>> >
>>> > sorry for the late reply. I guess you may have to set it through the
>>> hdfs
>>> > core-site.xml file. The property you need to set is "hadoop.tmp.dir"
>>> which
>>> > defaults to "/tmp/hadoop-${user.name}"
>>> >
>>> > Regards,
>>> > Keith.
>>> >
>>> > http://keith-chapman.com
>>> >
>>> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <ms...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Keith,
>>> >>
>>> >> Thank you for the idea!
>>> >> I have tried it, so now the executor command is looking in the
>>> following
>>> >> way :
>>> >>
>>> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>>> >> '-Djava.io.tmpdir=my_prefered_path'
>>> >>
>>> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/
>>> appcache/application_1521110306769_0041/container_1521110306
>>> 769_0041_01_000004/tmp
>>> >>
>>> >> JVM is using the second Djava.io.tmpdir parameter and writing
>>> >> everything to the same directory as before.
>>> >>
>>> >> Best,
>>> >> Michael
>>> >> Sincerely,
>>> >> Michael Shtelma
>>> >>
>>> >>
>>> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <
>>> keithgchapman@gmail.com>
>>> >> wrote:
>>> >> > Can you try setting spark.executor.extraJavaOptions to have
>>> >> > -Djava.io.tmpdir=someValue
>>> >> >
>>> >> > Regards,
>>> >> > Keith.
>>> >> >
>>> >> > http://keith-chapman.com
>>> >> >
>>> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <
>>> mshtelma@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Keith,
>>> >> >>
>>> >> >> Thank you for your answer!
>>> >> >> I have done this, and it is working for spark driver.
>>> >> >> I would like to make something like this for the executors as
>>> well, so
>>> >> >> that the setting will be used on all the nodes, where I have
>>> executors
>>> >> >> running.
>>> >> >>
>>> >> >> Best,
>>> >> >> Michael
>>> >> >>
>>> >> >>
>>> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>>> >> >> <ke...@gmail.com>
>>> >> >> wrote:
>>> >> >> > Hi Michael,
>>> >> >> >
>>> >> >> > You could either set spark.local.dir through spark conf or
>>> >> >> > java.io.tmpdir
>>> >> >> > system property.
>>> >> >> >
>>> >> >> > Regards,
>>> >> >> > Keith.
>>> >> >> >
>>> >> >> > http://keith-chapman.com
>>> >> >> >
>>> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <
>>> mshtelma@gmail.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi everybody,
>>> >> >> >>
>>> >> >> >> I am running spark job on yarn, and my problem is that the
>>> >> >> >> blockmgr-*
>>> >> >> >> folders are being created under
>>> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat
>>> ion_id/*
>>> >> >> >> The size of this folder can grow to a significant size and does
>>> not
>>> >> >> >> really fit into /tmp file system for one job, which makes a real
>>> >> >> >> problem for my installation.
>>> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>>> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>>> >> >> >> location and expected that the block manager will create the
>>> files
>>> >> >> >> there and not under /tmp, but this is not the case. The files
>>> are
>>> >> >> >> created under /tmp.
>>> >> >> >>
>>> >> >> >> I am wondering if there is a way to make spark not use /tmp at
>>> all
>>> >> >> >> and
>>> >> >> >> configure it to create all the files somewhere else ?
>>> >> >> >>
>>> >> >> >> Any assistance would be greatly appreciated!
>>> >> >> >>
>>> >> >> >> Best,
>>> >> >> >> Michael
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------
>>> ---------
>>> >> >> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>> >> >> >>
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Michael Shtelma <ms...@gmail.com>.

Hi,

this property will be used in YARN mode only by the driver.
Executors will use the properties coming from YARN for storing temporary
files.


Best,
Michael

On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
>
> As per documentation in: https://spark.apache.org/
> docs/latest/configuration.html
>
>
> spark.local.dir /tmp Directory to use for "scratch" space in Spark,
> including map output files and RDDs that get stored on disk. This should be
> on a fast, local disk in your system. It can also be a comma-separated list
> of multiple directories on different disks. NOTE: In Spark 1.0 and later
> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
> LOCAL_DIRS (YARN) environment variables set by the cluster manager.
>
> Regards,
> Gourav Sengupta
>
>
>
>
>
> On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <ms...@gmail.com>
> wrote:
>
>> Hi Keith,
>>
>> Thanks  for the suggestion!
>> I have solved this already.
>> The problem was, that the yarn process was not responding to
>> start/stop commands and has not applied my configuration changes.
>> I have killed it and restarted my cluster, and after that yarn has
>> started using yarn.nodemanager.local-dirs parameter defined in
>> yarn-site.xml.
>> After this change, -Djava.io.tmpdir for the spark executor was set
>> correctly,  according to yarn.nodemanager.local-dirs parameter.
>>
>> Best,
>> Michael
>>
>>
>> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <ke...@gmail.com>
>> wrote:
>> > Hi Michael,
>> >
>> > sorry for the late reply. I guess you may have to set it through the
>> hdfs
>> > core-site.xml file. The property you need to set is "hadoop.tmp.dir"
>> which
>> > defaults to "/tmp/hadoop-${user.name}"
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <ms...@gmail.com>
>> wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for the idea!
>> >> I have tried it, so now the executor command is looking in the
>> following
>> >> way :
>> >>
>> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> >> '-Djava.io.tmpdir=my_prefered_path'
>> >>
>> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/
>> appcache/application_1521110306769_0041/container_1521110306
>> 769_0041_01_000004/tmp
>> >>
>> >> JVM is using the second Djava.io.tmpdir parameter and writing
>> >> everything to the same directory as before.
>> >>
>> >> Best,
>> >> Michael
>> >> Sincerely,
>> >> Michael Shtelma
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <
>> keithgchapman@gmail.com>
>> >> wrote:
>> >> > Can you try setting spark.executor.extraJavaOptions to have
>> >> > -Djava.io.tmpdir=someValue
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <
>> mshtelma@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Keith,
>> >> >>
>> >> >> Thank you for your answer!
>> >> >> I have done this, and it is working for spark driver.
>> >> >> I would like to make something like this for the executors as well,
>> so
>> >> >> that the setting will be used on all the nodes, where I have
>> executors
>> >> >> running.
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> >> <ke...@gmail.com>
>> >> >> wrote:
>> >> >> > Hi Michael,
>> >> >> >
>> >> >> > You could either set spark.local.dir through spark conf or
>> >> >> > java.io.tmpdir
>> >> >> > system property.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Keith.
>> >> >> >
>> >> >> > http://keith-chapman.com
>> >> >> >
>> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <
>> mshtelma@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi everybody,
>> >> >> >>
>> >> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> >> blockmgr-*
>> >> >> >> folders are being created under
>> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat
>> ion_id/*
>> >> >> >> The size of this folder can grow to a significant size and does
>> not
>> >> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> >> problem for my installation.
>> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> >> location and expected that the block manager will create the
>> files
>> >> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> >> created under /tmp.
>> >> >> >>
>> >> >> >> I am wondering if there is a way to make spark not use /tmp at
>> all
>> >> >> >> and
>> >> >> >> configure it to create all the files somewhere else ?
>> >> >> >>
>> >> >> >> Any assistance would be greatly appreciated!
>> >> >> >>
>> >> >> >> Best,
>> >> >> >> Michael
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------
>> ---------
>> >> >> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >> >> >>
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,


As per documentation in:
https://spark.apache.org/docs/latest/configuration.html


spark.local.dir /tmp Directory to use for "scratch" space in Spark,
including map output files and RDDs that get stored on disk. This should be
on a fast, local disk in your system. It can also be a comma-separated list
of multiple directories on different disks. NOTE: In Spark 1.0 and later
this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
LOCAL_DIRS (YARN) environment variables set by the cluster manager.

Regards,
Gourav Sengupta





On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <ms...@gmail.com> wrote:

> Hi Keith,
>
> Thanks  for the suggestion!
> I have solved this already.
> The problem was, that the yarn process was not responding to
> start/stop commands and has not applied my configuration changes.
> I have killed it and restarted my cluster, and after that yarn has
> started using yarn.nodemanager.local-dirs parameter defined in
> yarn-site.xml.
> After this change, -Djava.io.tmpdir for the spark executor was set
> correctly,  according to yarn.nodemanager.local-dirs parameter.
>
> Best,
> Michael
>
>
> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <ke...@gmail.com>
> wrote:
> > Hi Michael,
> >
> > sorry for the late reply. I guess you may have to set it through the hdfs
> > core-site.xml file. The property you need to set is "hadoop.tmp.dir"
> which
> > defaults to "/tmp/hadoop-${user.name}"
> >
> > Regards,
> > Keith.
> >
> > http://keith-chapman.com
> >
> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <ms...@gmail.com>
> wrote:
> >>
> >> Hi Keith,
> >>
> >> Thank you for the idea!
> >> I have tried it, so now the executor command is looking in the following
> >> way :
> >>
> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
> >> '-Djava.io.tmpdir=my_prefered_path'
> >>
> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/
> msh/appcache/application_1521110306769_0041/container_
> 1521110306769_0041_01_000004/tmp
> >>
> >> JVM is using the second Djava.io.tmpdir parameter and writing
> >> everything to the same directory as before.
> >>
> >> Best,
> >> Michael
> >> Sincerely,
> >> Michael Shtelma
> >>
> >>
> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <keithgchapman@gmail.com
> >
> >> wrote:
> >> > Can you try setting spark.executor.extraJavaOptions to have
> >> > -Djava.io.tmpdir=someValue
> >> >
> >> > Regards,
> >> > Keith.
> >> >
> >> > http://keith-chapman.com
> >> >
> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <mshtelma@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi Keith,
> >> >>
> >> >> Thank you for your answer!
> >> >> I have done this, and it is working for spark driver.
> >> >> I would like to make something like this for the executors as well,
> so
> >> >> that the setting will be used on all the nodes, where I have
> executors
> >> >> running.
> >> >>
> >> >> Best,
> >> >> Michael
> >> >>
> >> >>
> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
> >> >> <ke...@gmail.com>
> >> >> wrote:
> >> >> > Hi Michael,
> >> >> >
> >> >> > You could either set spark.local.dir through spark conf or
> >> >> > java.io.tmpdir
> >> >> > system property.
> >> >> >
> >> >> > Regards,
> >> >> > Keith.
> >> >> >
> >> >> > http://keith-chapman.com
> >> >> >
> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <
> mshtelma@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi everybody,
> >> >> >>
> >> >> >> I am running spark job on yarn, and my problem is that the
> >> >> >> blockmgr-*
> >> >> >> folders are being created under
> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/
> application_id/*
> >> >> >> The size of this folder can grow to a significant size and does
> not
> >> >> >> really fit into /tmp file system for one job, which makes a real
> >> >> >> problem for my installation.
> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
> >> >> >> location and expected that the block manager will create the files
> >> >> >> there and not under /tmp, but this is not the case. The files are
> >> >> >> created under /tmp.
> >> >> >>
> >> >> >> I am wondering if there is a way to make spark not use /tmp at all
> >> >> >> and
> >> >> >> configure it to create all the files somewhere else ?
> >> >> >>
> >> >> >> Any assistance would be greatly appreciated!
> >> >> >>
> >> >> >> Best,
> >> >> >> Michael
> >> >> >>
> >> >> >>
> >> >> >> ------------------------------------------------------------
> ---------
> >> >> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Michael Shtelma <ms...@gmail.com>.

Hi Keith,

Thanks  for the suggestion!
I have solved this already.
The problem was, that the yarn process was not responding to
start/stop commands and has not applied my configuration changes.
I have killed it and restarted my cluster, and after that yarn has
started using yarn.nodemanager.local-dirs parameter defined in
yarn-site.xml.
After this change, -Djava.io.tmpdir for the spark executor was set
correctly,  according to yarn.nodemanager.local-dirs parameter.

Best,
Michael


On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <ke...@gmail.com> wrote:
> Hi Michael,
>
> sorry for the late reply. I guess you may have to set it through the hdfs
> core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
> defaults to "/tmp/hadoop-${user.name}"
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <ms...@gmail.com> wrote:
>>
>> Hi Keith,
>>
>> Thank you for the idea!
>> I have tried it, so now the executor command is looking in the following
>> way :
>>
>> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> '-Djava.io.tmpdir=my_prefered_path'
>>
>> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp
>>
>> JVM is using the second Djava.io.tmpdir parameter and writing
>> everything to the same directory as before.
>>
>> Best,
>> Michael
>> Sincerely,
>> Michael Shtelma
>>
>>
>> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <ke...@gmail.com>
>> wrote:
>> > Can you try setting spark.executor.extraJavaOptions to have
>> > -Djava.io.tmpdir=someValue
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <ms...@gmail.com>
>> > wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for your answer!
>> >> I have done this, and it is working for spark driver.
>> >> I would like to make something like this for the executors as well, so
>> >> that the setting will be used on all the nodes, where I have executors
>> >> running.
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> <ke...@gmail.com>
>> >> wrote:
>> >> > Hi Michael,
>> >> >
>> >> > You could either set spark.local.dir through spark conf or
>> >> > java.io.tmpdir
>> >> > system property.
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <ms...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi everybody,
>> >> >>
>> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> blockmgr-*
>> >> >> folders are being created under
>> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> >> The size of this folder can grow to a significant size and does not
>> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> problem for my installation.
>> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> location and expected that the block manager will create the files
>> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> created under /tmp.
>> >> >>
>> >> >> I am wondering if there is a way to make spark not use /tmp at all
>> >> >> and
>> >> >> configure it to create all the files somewhere else ?
>> >> >>
>> >> >> Any assistance would be greatly appreciated!
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Keith Chapman <ke...@gmail.com>.

Hi Michael,

sorry for the late reply. I guess you may have to set it through the hdfs
core-site.xml file. The property you need to set is "hadoop.tmp.dir" which
defaults to "/tmp/hadoop-${user.name}"

Regards,
Keith.

http://keith-chapman.com

On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <ms...@gmail.com> wrote:

> Hi Keith,
>
> Thank you for the idea!
> I have tried it, so now the executor command is looking in the following
> way :
>
> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
> '-Djava.io.tmpdir=my_prefered_path'
> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/
> msh/appcache/application_1521110306769_0041/container_
> 1521110306769_0041_01_000004/tmp
>
> JVM is using the second Djava.io.tmpdir parameter and writing
> everything to the same directory as before.
>
> Best,
> Michael
> Sincerely,
> Michael Shtelma
>
>
> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <ke...@gmail.com>
> wrote:
> > Can you try setting spark.executor.extraJavaOptions to have
> > -Djava.io.tmpdir=someValue
> >
> > Regards,
> > Keith.
> >
> > http://keith-chapman.com
> >
> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <ms...@gmail.com>
> > wrote:
> >>
> >> Hi Keith,
> >>
> >> Thank you for your answer!
> >> I have done this, and it is working for spark driver.
> >> I would like to make something like this for the executors as well, so
> >> that the setting will be used on all the nodes, where I have executors
> >> running.
> >>
> >> Best,
> >> Michael
> >>
> >>
> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <keithgchapman@gmail.com
> >
> >> wrote:
> >> > Hi Michael,
> >> >
> >> > You could either set spark.local.dir through spark conf or
> >> > java.io.tmpdir
> >> > system property.
> >> >
> >> > Regards,
> >> > Keith.
> >> >
> >> > http://keith-chapman.com
> >> >
> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <ms...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi everybody,
> >> >>
> >> >> I am running spark job on yarn, and my problem is that the blockmgr-*
> >> >> folders are being created under
> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
> >> >> The size of this folder can grow to a significant size and does not
> >> >> really fit into /tmp file system for one job, which makes a real
> >> >> problem for my installation.
> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
> >> >> location and expected that the block manager will create the files
> >> >> there and not under /tmp, but this is not the case. The files are
> >> >> created under /tmp.
> >> >>
> >> >> I am wondering if there is a way to make spark not use /tmp at all
> and
> >> >> configure it to create all the files somewhere else ?
> >> >>
> >> >> Any assistance would be greatly appreciated!
> >> >>
> >> >> Best,
> >> >> Michael
> >> >>
> >> >> ------------------------------------------------------------
> ---------
> >> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >> >>
> >> >
> >
> >
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Michael Shtelma <ms...@gmail.com>.

Hi Keith,

Thank you for the idea!
I have tried it, so now the executor command is looking in the following way :

/bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
'-Djava.io.tmpdir=my_prefered_path'
-Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_1521110306769_0041/container_1521110306769_0041_01_000004/tmp

JVM is using the second Djava.io.tmpdir parameter and writing
everything to the same directory as before.

Best,
Michael
Sincerely,
Michael Shtelma


On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <ke...@gmail.com> wrote:
> Can you try setting spark.executor.extraJavaOptions to have
> -Djava.io.tmpdir=someValue
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <ms...@gmail.com>
> wrote:
>>
>> Hi Keith,
>>
>> Thank you for your answer!
>> I have done this, and it is working for spark driver.
>> I would like to make something like this for the executors as well, so
>> that the setting will be used on all the nodes, where I have executors
>> running.
>>
>> Best,
>> Michael
>>
>>
>> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <ke...@gmail.com>
>> wrote:
>> > Hi Michael,
>> >
>> > You could either set spark.local.dir through spark conf or
>> > java.io.tmpdir
>> > system property.
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <ms...@gmail.com>
>> > wrote:
>> >>
>> >> Hi everybody,
>> >>
>> >> I am running spark job on yarn, and my problem is that the blockmgr-*
>> >> folders are being created under
>> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> >> The size of this folder can grow to a significant size and does not
>> >> really fit into /tmp file system for one job, which makes a real
>> >> problem for my installation.
>> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> location and expected that the block manager will create the files
>> >> there and not under /tmp, but this is not the case. The files are
>> >> created under /tmp.
>> >>
>> >> I am wondering if there is a way to make spark not use /tmp at all and
>> >> configure it to create all the files somewhere else ?
>> >>
>> >> Any assistance would be greatly appreciated!
>> >>
>> >> Best,
>> >> Michael
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Keith Chapman <ke...@gmail.com>.

Can you try setting spark.executor.extraJavaOptions to have -D
java.io.tmpdir=someValue

Regards,
Keith.

http://keith-chapman.com

On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <ms...@gmail.com>
wrote:

> Hi Keith,
>
> Thank you for your answer!
> I have done this, and it is working for spark driver.
> I would like to make something like this for the executors as well, so
> that the setting will be used on all the nodes, where I have executors
> running.
>
> Best,
> Michael
>
>
> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <ke...@gmail.com>
> wrote:
> > Hi Michael,
> >
> > You could either set spark.local.dir through spark conf or java.io.tmpdir
> > system property.
> >
> > Regards,
> > Keith.
> >
> > http://keith-chapman.com
> >
> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <ms...@gmail.com>
> wrote:
> >>
> >> Hi everybody,
> >>
> >> I am running spark job on yarn, and my problem is that the blockmgr-*
> >> folders are being created under
> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
> >> The size of this folder can grow to a significant size and does not
> >> really fit into /tmp file system for one job, which makes a real
> >> problem for my installation.
> >> I have redefined hadoop.tmp.dir in core-site.xml and
> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
> >> location and expected that the block manager will create the files
> >> there and not under /tmp, but this is not the case. The files are
> >> created under /tmp.
> >>
> >> I am wondering if there is a way to make spark not use /tmp at all and
> >> configure it to create all the files somewhere else ?
> >>
> >> Any assistance would be greatly appreciated!
> >>
> >> Best,
> >> Michael
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>
> >
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Michael Shtelma <ms...@gmail.com>.

Hi Keith,

Thank you for your answer!
I have done this, and it is working for spark driver.
I would like to make something like this for the executors as well, so
that the setting will be used on all the nodes, where I have executors
running.

Best,
Michael


On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman <ke...@gmail.com> wrote:
> Hi Michael,
>
> You could either set spark.local.dir through spark conf or java.io.tmpdir
> system property.
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>
> On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <ms...@gmail.com> wrote:
>>
>> Hi everybody,
>>
>> I am running spark job on yarn, and my problem is that the blockmgr-*
>> folders are being created under
>> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
>> The size of this folder can grow to a significant size and does not
>> really fit into /tmp file system for one job, which makes a real
>> problem for my installation.
>> I have redefined hadoop.tmp.dir in core-site.xml and
>> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> location and expected that the block manager will create the files
>> there and not under /tmp, but this is not the case. The files are
>> created under /tmp.
>>
>> I am wondering if there is a way to make spark not use /tmp at all and
>> configure it to create all the files somewhere else ?
>>
>> Any assistance would be greatly appreciated!
>>
>> Best,
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Posted by Keith Chapman <ke...@gmail.com>.

Hi Michael,

You could either set spark.local.dir through spark conf or java.io.tmpdir
system property.

Regards,
Keith.

http://keith-chapman.com

On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <ms...@gmail.com> wrote:

> Hi everybody,
>
> I am running spark job on yarn, and my problem is that the blockmgr-*
> folders are being created under
> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/application_id/*
> The size of this folder can grow to a significant size and does not
> really fit into /tmp file system for one job, which makes a real
> problem for my installation.
> I have redefined hadoop.tmp.dir in core-site.xml and
> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
> location and expected that the block manager will create the files
> there and not under /tmp, but this is not the case. The files are
> created under /tmp.
>
> I am wondering if there is a way to make spark not use /tmp at all and
> configure it to create all the files somewhere else ?
>
> Any assistance would be greatly appreciated!
>
> Best,
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>