You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Johannes Zillmann <jz...@googlemail.com> on 2015/08/03 10:38:20 UTC

Diskspace problems on spill

Hey Tez Community,
we had 2 customer with ‘Out of disk space problem’ during spill.
Guess both case/problems have to do with a high data skew, leading the most data to go to a single ‘aggregation’ vertex. So general problem has probably to be solved on a much higher level then Tez…
Anyway.. wanna to ask if there is any Tez configuration or future release (running Tez 0.6) which might improve the disk utilisation during such heavyweight sorts !?

best 
Johannes

TaskAttempt 0 failed, info=[Error: exceptionThrown=org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:832)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:732)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:660)
Caused by: java.io.IOException: No space left on device
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:345)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
	... 8 more
, errorMessage=Task attempt_1437574820923_0019_2_00_000041_0_10003 failed : org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:832)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:732)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:660)
Caused by: java.io.IOException: No space left on device
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:345)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
	... 8 more
:org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:832)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:732)
	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:660)
Caused by: java.io.IOException: No space left on device
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:345)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
	... 8 more


Re: Diskspace problems on spill

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Anyway.. wanna to ask if there is any Tez configuration or future
>release (running Tez 0.6) which might improve the disk utilisation during
>such heavyweight sorts !?

Make sure compression is turned on. Everytime I¹ve seen this issue, it had
to do with someone turning off compression due to a bad libsnappy install.

tez.runtime.compress/tez.runtime.compress.codec

If you happen to use DefaultCodec, remember to set
zlib.compress.level=BEST_SPEED (is not an int) as well in the job conf.

Further up in 0.8.x land, we don¹t do full merges if the pipelined shuffle
is turned on, which for a 6 disk system allows a single skewed task to be
about 12x bigger before hitting this exception.

Cheers,
Gopal



Re: Diskspace problems on spill

Posted by Rajesh Balamohan <rb...@apache.org>.
During spill Tez tries to create spill files by specifying filesize to
localDirAllocator, which tries to search for a disk which can fit in the
spill file size. However, there is no guarantee that LocalDirAllocator
would reserve this in disk (i.e it is quite possible that other writes
could fill up the space by writing contents to the disk).

Plz refer to https://issues.apache.org/jira/browse/TEZ-1277 which was
created sometime based on similar issue.

~Rajesh.B

On Mon, Aug 3, 2015 at 2:08 PM, Johannes Zillmann <jz...@googlemail.com>
wrote:

> Hey Tez Community,
>
> we had 2 customer with ‘Out of disk space problem’ during spill.
>
> Guess both case/problems have to do with a high data skew, leading the most data to go to a single ‘aggregation’ vertex. So general problem has probably to be solved on a much higher level then Tez…
>
> Anyway.. wanna to ask if there is any Tez configuration or future release (running Tez 0.6) which might improve the disk utilisation during such heavyweight sorts !?
>
>
> best
>
> Johannes
>
>
> TaskAttempt 0 failed, info=[Error: exceptionThrown=org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> 	at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:832)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:732)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:660)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:345)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
> 	... 8 more
> , errorMessage=Task attempt_1437574820923_0019_2_00_000041_0_10003 failed : org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> 	at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:832)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:732)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:660)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:345)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
> 	... 8 more
> :org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> 	at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.spill(DefaultSorter.java:832)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.sortAndSpill(DefaultSorter.java:732)
> 	at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter$SpillThread.run(DefaultSorter.java:660)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:345)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
> 	... 8 more
>
>
>