You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jack Yang <ji...@uow.edu.au> on 2015/10/06 00:43:48 UTC
RE: No space left on device when running graphx job
Just something usual as below:
1. Check the physical disk volume (particularly /tmp folder)
2. Use spark.local.dir to check the size of the temp files
3. Add more workers
4. Decrease partitions (in code)
From: Robin East [mailto:robin.east@xense.co.uk]
Sent: Saturday, 26 September 2015 12:27 AM
To: Jack Yang
Cc: Ted Yu; Andy Huang; user@spark.apache.org
Subject: Re: No space left on device when running graphx job
Would you mind sharing what your solution was? It would help those on the forum who might run into the same problem. Even it it’s a silly ‘gotcha’ it would help to know what it was and how you spotted the source of the issue.
Robin
On 25 Sep 2015, at 05:34, Jack Yang <ji...@uow.edu.au>> wrote:
Hi all,
I resolved the problems.
Thanks folk.
Jack
From: Jack Yang [mailto:jiey@uow.edu.au]
Sent: Friday, 25 September 2015 9:57 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: No space left on device when running graphx job
Also, please see the screenshot below from spark web ui:
This is the snapshot just 5 seconds (I guess) before the job crashed.
<image001.png>
From: Jack Yang [mailto:jiey@uow.edu.au]
Sent: Friday, 25 September 2015 9:55 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: RE: No space left on device when running graphx job
Hi, here is the full stack trace:
15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 (TID 62230, 192.168.70.129): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply$mcVJ$sp(IndexShuffleBlockResolver.scala:86)
at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:168)
at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply$mcV$sp(IndexShuffleBlockResolver.scala:84)
at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFile(IndexShuffleBlockResolver.scala:88)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I am using df –i command to monitor the inode usage, which shows the below all the time:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 1245184 275424 969760 23% /
udev 382148 484 381664 1% /dev
tmpfs 384505 366 384139 1% /run
none 384505 3 384502 1% /run/lock
none 384505 1 384504 1% /run/shm
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Thursday, 24 September 2015 9:12 PM
To: Andy Huang
Cc: Jack Yang; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: No space left on device when running graphx job
Andy:
Can you show complete stack trace ?
Have you checked there are enough free inode on the .129 machine ?
Cheers
On Sep 23, 2015, at 11:43 PM, Andy Huang <an...@servian.com.au>> wrote:
Hi Jack,
Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM filled up) and it's running out of disk space.
Cheers
Andy
On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang <ji...@uow.edu.au>> wrote:
Hi folk,
I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores)
Basically, I load data using GraphLoader.edgeListFile mthod and then count number of nodes using: graph.vertices.count() method.
The problem is :
Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
when I try a small amount of data, the code is working. So I guess the error comes from the amount of data.
This is how I submit the job:
spark-submit --class "myclass"
--master spark://hadoopmaster:7077 (I am using standalone)
--executor-memory 2048M
--driver-java-options "-XX:MaxPermSize=2G"
--total-executor-cores 4 my.jar
Any thoughts?
Best regards,
Jack
--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979