You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jack Yang <ji...@uow.edu.au> on 2015/10/06 00:43:48 UTC
RE: No space left on device when running graphx job

Just something usual as below:



1.      Check the physical disk volume (particularly /tmp folder)

2.      Use spark.local.dir to check the size of the temp files

3.      Add more workers

4.      Decrease partitions (in code)

From: Robin East [mailto:robin.east@xense.co.uk]
Sent: Saturday, 26 September 2015 12:27 AM
To: Jack Yang
Cc: Ted Yu; Andy Huang; user@spark.apache.org
Subject: Re: No space left on device when running graphx job

Would you mind sharing what your solution was? It would help those on the forum who might run into the same problem. Even it it’s a silly ‘gotcha’ it would help to know what it was and how you spotted the source of the issue.

Robin



On 25 Sep 2015, at 05:34, Jack Yang <ji...@uow.edu.au>> wrote:

Hi all,
I resolved the problems.
Thanks folk.
Jack

From: Jack Yang [mailto:jiey@uow.edu.au]
Sent: Friday, 25 September 2015 9:57 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: No space left on device when running graphx job

Also, please see the screenshot below from spark web ui:
This is the snapshot just 5 seconds (I guess) before the job crashed.

<image001.png>

From: Jack Yang [mailto:jiey@uow.edu.au]
Sent: Friday, 25 September 2015 9:55 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: No space left on device when running graphx job

Hi, here is the full stack trace:

15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 (TID 62230, 192.168.70.129): java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
        at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply$mcVJ$sp(IndexShuffleBlockResolver.scala:86)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:168)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply$mcV$sp(IndexShuffleBlockResolver.scala:84)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFile(IndexShuffleBlockResolver.scala:88)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)


I am using df –i command to monitor the inode usage, which shows the below all the time:

Filesystem      Inodes  IUsed  IFree IUse% Mounted on
/dev/sda1      1245184 275424 969760   23% /
udev            382148    484 381664    1% /dev
tmpfs           384505    366 384139    1% /run
none            384505      3 384502    1% /run/lock
none            384505      1 384504    1% /run/shm



From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Thursday, 24 September 2015 9:12 PM
To: Andy Huang
Cc: Jack Yang; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: No space left on device when running graphx job

Andy:
Can you show complete stack trace ?

Have you checked there are enough free inode on the .129 machine ?

Cheers

On Sep 23, 2015, at 11:43 PM, Andy Huang <an...@servian.com.au>> wrote:
Hi Jack,

Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM filled up) and it's running out of disk space.

Cheers
Andy

On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang <ji...@uow.edu.au>> wrote:
Hi folk,

I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores)
Basically, I load data using GraphLoader.edgeListFile mthod and then count number of nodes using: graph.vertices.count() method.
The problem is :

Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129): java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)

when I try a small amount of data, the code is working. So I guess the error comes from the amount of data.
This is how I submit the job:

spark-submit --class "myclass"
--master spark://hadoopmaster:7077  (I am using standalone)
--executor-memory 2048M
--driver-java-options "-XX:MaxPermSize=2G"
--total-executor-cores 4  my.jar


Any thoughts?
Best regards,
Jack




--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979