You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Manoj GEORGE <ma...@amadeus.com.INVALID> on 2022/09/01 12:50:29 UTC

running pyspark on kubernetes - no space left on device

CONFIDENTIAL & RESTRICTED

Hi Team,

I am new to spark, so please excuse my ignorance.

Currently we are trying to run PySpark on Kubernetes cluster. The setup is working fine for some jobs, but when we are processing a large file ( 36 gb),  we run into one of space issues.

Based on what was found on internet, we have mapped the local dir to a persistent volume. This still doesn’t solve the issue.

I am not sure if it is still writing to /tmp folder on the pod. Is there some other setting which need to be changed for this to work.

Thanks in advance.



Thanks,
Manoj George
Manager Database Architecture​
M: +1 3522786801
manoj.george@amadeus.com<ma...@amadeus.com>
www.amadeus.com<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amadeus.com%2F&data=04%7C01%7C%7C816fea82fee64ec9c05f08d8c7c976a2%7Cb3f4f7c272ce4192aba4d6c7719b5766%7C0%7C0%7C637479015378750789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8ilxfJIjh2sdR5HEKHj%2BO3ip2kCFZWHE%2FohZY9MiK9A%3D&reserved=0>​
[cid:image001.png@01D8BDDF.E19AB9C0]<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amadeus.com%2F&data=04%7C01%7C%7C24fba0fe5fb042d0d88d08d8ceb970a0%7Cb3f4f7c272ce4192aba4d6c7719b5766%7C0%7C0%7C637486643149763604%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mapLR2rqtTGuricqiKm7V0SqLQpJUYOjkyWFRQ3QMGs%3D&reserved=0>

Disclaimer: This email message and information contained in or attached to this message may be privileged, confidential, and protected from disclosure and is intended only for the person or entity to which it is addressed. Any review, retransmission, dissemination, printing or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive this message in error, please immediately inform the sender by reply email and delete the message and any attachments. Thank you.

Re: running pyspark on kubernetes - no space left on device

Posted by Qian SUN <qi...@gmail.com>.
Hi
Spark provides spark.local.dir configuration to specify work folder on the
pod. You can specify spark.local.dir as your mount path.

Best regards

Manoj GEORGE <ma...@amadeus.com.invalid> 于2022年9月1日周四 21:16写道:

> CONFIDENTIAL & RESTRICTED
>
> Hi Team,
>
>
>
> I am new to spark, so please excuse my ignorance.
>
>
>
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is
> working fine for some jobs, but when we are processing a large file ( 36
> gb),  we run into one of space issues.
>
>
>
> Based on what was found on internet, we have mapped the local dir to a
> persistent volume. This still doesn’t solve the issue.
>
>
>
> I am not sure if it is still writing to /tmp folder on the pod. Is there
> some other setting which need to be changed for this to work.
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Thanks,
>
> Manoj George
>
> *Manager Database Architecture*​
> M: +1 3522786801
>
> manoj.george@amadeus.com
>
> www.amadeus.com
> <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amadeus.com%2F&data=04%7C01%7C%7C816fea82fee64ec9c05f08d8c7c976a2%7Cb3f4f7c272ce4192aba4d6c7719b5766%7C0%7C0%7C637479015378750789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8ilxfJIjh2sdR5HEKHj%2BO3ip2kCFZWHE%2FohZY9MiK9A%3D&reserved=0>
> ​
>
>
> <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amadeus.com%2F&data=04%7C01%7C%7C24fba0fe5fb042d0d88d08d8ceb970a0%7Cb3f4f7c272ce4192aba4d6c7719b5766%7C0%7C0%7C637486643149763604%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mapLR2rqtTGuricqiKm7V0SqLQpJUYOjkyWFRQ3QMGs%3D&reserved=0>
>
>
> Disclaimer: This email message and information contained in or attached to
> this message may be privileged, confidential, and protected from disclosure
> and is intended only for the person or entity to which it is addressed. Any
> review, retransmission, dissemination, printing or other use of, or taking
> of any action in reliance upon, this information by persons or entities
> other than the intended recipient is prohibited. If you receive this
> message in error, please immediately inform the sender by reply email and
> delete the message and any attachments. Thank you.
>


-- 
Best!
Qian SUN

Re: running pyspark on kubernetes - no space left on device

Posted by Matt Proetsch <ma...@gmail.com>.
Hi George,

You can try mounting a larger PersistentVolume to the work directory as described here instead of using localdir which might have site-specific size constraints:

https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes

-Matt

> On Sep 1, 2022, at 09:16, Manoj GEORGE <ma...@amadeus.com.invalid> wrote:
> 
> 
> CONFIDENTIAL & RESTRICTED
> 
> Hi Team,
>  
> I am new to spark, so please excuse my ignorance.
>  
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is working fine for some jobs, but when we are processing a large file ( 36 gb),  we run into one of space issues.
>  
> Based on what was found on internet, we have mapped the local dir to a persistent volume. This still doesn’t solve the issue.
>  
> I am not sure if it is still writing to /tmp folder on the pod. Is there some other setting which need to be changed for this to work.
>  
> Thanks in advance.
>  
>  
>  
> Thanks,
> Manoj George
> Manager Database Architecture​
> M: +1 3522786801
> manoj.george@amadeus.com
> www.amadeus.com​
> 
>  
> Disclaimer: This email message and information contained in or attached to this message may be privileged, confidential, and protected from disclosure and is intended only for the person or entity to which it is addressed. Any review, retransmission, dissemination, printing or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive this message in error, please immediately inform the sender by reply email and delete the message and any attachments. Thank you.