You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Tim Potter <te...@yahoo-inc.com> on 2014/01/24 15:47:21 UTC

No space left on device during merge.

Hi,
   I'm getting the below error while trying to sort a lot of data with Hadoop.

I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.

Thanks,
   Tim.

2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
Caused by: java.io.IOException: No space left on device
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:318)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
	... 14 more

2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes sampling is a great way to do. That's what is done with the Terasort example. See the code for org.apache.hadoop.examples.terasort.TeraSort and specifically org.apache.hadoop.examples.terasort.TeraInputFormat.

The other simpler option to start with is some kind of brute force partitioning. Something like lexicographical partitioning of URLs. It won't give you a great balance to begin with, but it should get you started.

+Vinod

On Jan 27, 2014, at 2:03 AM, Tim Potter <te...@yahoo-inc.com> wrote:

> Thanks for your reply Vinod.    I've been thinking about partitioning the data to having multiple reducers each one working on a contiguous part of the sort space.  The problems is the keys are a combination of URLs and RDF BNodes. I can't see a way, without previously analysing the data, of partitioning the URLs equally in the sort space. Although I'm complete open to suggestions..   I guess I could analyse a sample of the data and build a partition function that works well on that, then apply it to the full data set.
> 
> I was hoping  there was a way of tuning how Hadoop sorts.
> 
> Regards,
>   Tim.
> 
> On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
>> That's a lot of data to process for a single reducer. You should try increasing the number of reducers to achieve more parallelism and also try modifying your logic to avoid significant skew in the reducers.
>> 
>> Unfortunately this means rethinking about your app, but that's the only way about it. It will also help you scale smoothly into the future if you have adjustable parallelism and more balanced data processing.
>> 
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:
>> Hi,
>>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>> 
>> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>> 
>> Thanks,
>>   Tim.
>> 
>> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>> Caused by: java.io.IOException: No space left on device
>> 	at java.io.FileOutputStream.writeBytes(Native Method)
>> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>> 	... 14 more
>> 
>> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes sampling is a great way to do. That's what is done with the Terasort example. See the code for org.apache.hadoop.examples.terasort.TeraSort and specifically org.apache.hadoop.examples.terasort.TeraInputFormat.

The other simpler option to start with is some kind of brute force partitioning. Something like lexicographical partitioning of URLs. It won't give you a great balance to begin with, but it should get you started.

+Vinod

On Jan 27, 2014, at 2:03 AM, Tim Potter <te...@yahoo-inc.com> wrote:

> Thanks for your reply Vinod.    I've been thinking about partitioning the data to having multiple reducers each one working on a contiguous part of the sort space.  The problems is the keys are a combination of URLs and RDF BNodes. I can't see a way, without previously analysing the data, of partitioning the URLs equally in the sort space. Although I'm complete open to suggestions..   I guess I could analyse a sample of the data and build a partition function that works well on that, then apply it to the full data set.
> 
> I was hoping  there was a way of tuning how Hadoop sorts.
> 
> Regards,
>   Tim.
> 
> On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
>> That's a lot of data to process for a single reducer. You should try increasing the number of reducers to achieve more parallelism and also try modifying your logic to avoid significant skew in the reducers.
>> 
>> Unfortunately this means rethinking about your app, but that's the only way about it. It will also help you scale smoothly into the future if you have adjustable parallelism and more balanced data processing.
>> 
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:
>> Hi,
>>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>> 
>> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>> 
>> Thanks,
>>   Tim.
>> 
>> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>> Caused by: java.io.IOException: No space left on device
>> 	at java.io.FileOutputStream.writeBytes(Native Method)
>> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>> 	... 14 more
>> 
>> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes sampling is a great way to do. That's what is done with the Terasort example. See the code for org.apache.hadoop.examples.terasort.TeraSort and specifically org.apache.hadoop.examples.terasort.TeraInputFormat.

The other simpler option to start with is some kind of brute force partitioning. Something like lexicographical partitioning of URLs. It won't give you a great balance to begin with, but it should get you started.

+Vinod

On Jan 27, 2014, at 2:03 AM, Tim Potter <te...@yahoo-inc.com> wrote:

> Thanks for your reply Vinod.    I've been thinking about partitioning the data to having multiple reducers each one working on a contiguous part of the sort space.  The problems is the keys are a combination of URLs and RDF BNodes. I can't see a way, without previously analysing the data, of partitioning the URLs equally in the sort space. Although I'm complete open to suggestions..   I guess I could analyse a sample of the data and build a partition function that works well on that, then apply it to the full data set.
> 
> I was hoping  there was a way of tuning how Hadoop sorts.
> 
> Regards,
>   Tim.
> 
> On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
>> That's a lot of data to process for a single reducer. You should try increasing the number of reducers to achieve more parallelism and also try modifying your logic to avoid significant skew in the reducers.
>> 
>> Unfortunately this means rethinking about your app, but that's the only way about it. It will also help you scale smoothly into the future if you have adjustable parallelism and more balanced data processing.
>> 
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:
>> Hi,
>>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>> 
>> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>> 
>> Thanks,
>>   Tim.
>> 
>> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>> Caused by: java.io.IOException: No space left on device
>> 	at java.io.FileOutputStream.writeBytes(Native Method)
>> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>> 	... 14 more
>> 
>> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Yes sampling is a great way to do. That's what is done with the Terasort example. See the code for org.apache.hadoop.examples.terasort.TeraSort and specifically org.apache.hadoop.examples.terasort.TeraInputFormat.

The other simpler option to start with is some kind of brute force partitioning. Something like lexicographical partitioning of URLs. It won't give you a great balance to begin with, but it should get you started.

+Vinod

On Jan 27, 2014, at 2:03 AM, Tim Potter <te...@yahoo-inc.com> wrote:

> Thanks for your reply Vinod.    I've been thinking about partitioning the data to having multiple reducers each one working on a contiguous part of the sort space.  The problems is the keys are a combination of URLs and RDF BNodes. I can't see a way, without previously analysing the data, of partitioning the URLs equally in the sort space. Although I'm complete open to suggestions..   I guess I could analyse a sample of the data and build a partition function that works well on that, then apply it to the full data set.
> 
> I was hoping  there was a way of tuning how Hadoop sorts.
> 
> Regards,
>   Tim.
> 
> On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
>> That's a lot of data to process for a single reducer. You should try increasing the number of reducers to achieve more parallelism and also try modifying your logic to avoid significant skew in the reducers.
>> 
>> Unfortunately this means rethinking about your app, but that's the only way about it. It will also help you scale smoothly into the future if you have adjustable parallelism and more balanced data processing.
>> 
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:
>> Hi,
>>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>> 
>> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>> 
>> Thanks,
>>   Tim.
>> 
>> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>> Caused by: java.io.IOException: No space left on device
>> 	at java.io.FileOutputStream.writeBytes(Native Method)
>> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>> 	... 14 more
>> 
>> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Tim Potter <te...@yahoo-inc.com>.

Thanks for your reply Vinod.    I've been thinking about partitioning 
the data to having multiple reducers each one working on a contiguous 
part of the sort space. The problems is the keys are a combination of 
URLs and RDF BNodes. I can't see a way, without previously analysing the 
data, of partitioning the URLs equally in the sort space. Although I'm 
complete open to suggestions..   I guess I could analyse a sample of the 
data and build a partition function that works well on that, then apply 
it to the full data set.

I was hoping  there was a way of tuning how Hadoop sorts.

Regards,
   Tim.

On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
> That's a lot of data to process for a single reducer. You should try 
> increasing the number of reducers to achieve more parallelism and also 
> try modifying your logic to avoid significant skew in the reducers.
>
> Unfortunately this means rethinking about your app, but that's the 
> only way about it. It will also help you scale smoothly into the 
> future if you have adjustable parallelism and more balanced data 
> processing.
>
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <tep@yahoo-inc.com 
> <ma...@yahoo-inc.com>> wrote:
>
>     Hi,
>        I'm getting the below error while trying to sort a lot of data with Hadoop.
>
>     I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>     to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
>     Thanks,
>        Tim.
>
>     2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>     2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>     2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>     	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>     	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>     	at java.security.AccessController.doPrivileged(Native Method)
>     	at javax.security.auth.Subject.doAs(Subject.java:415)
>     	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>     	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>     Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>     	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>     	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>     	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>     Caused by: java.io.IOException: No space left on device
>     	at java.io.FileOutputStream.writeBytes(Native Method)
>     	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>     	... 14 more
>
>     2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. 
> Thank You.

Re: No space left on device during merge.

Posted by Tim Potter <te...@yahoo-inc.com>.

Thanks for your reply Vinod.    I've been thinking about partitioning 
the data to having multiple reducers each one working on a contiguous 
part of the sort space. The problems is the keys are a combination of 
URLs and RDF BNodes. I can't see a way, without previously analysing the 
data, of partitioning the URLs equally in the sort space. Although I'm 
complete open to suggestions..   I guess I could analyse a sample of the 
data and build a partition function that works well on that, then apply 
it to the full data set.

I was hoping  there was a way of tuning how Hadoop sorts.

Regards,
   Tim.

On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
> That's a lot of data to process for a single reducer. You should try 
> increasing the number of reducers to achieve more parallelism and also 
> try modifying your logic to avoid significant skew in the reducers.
>
> Unfortunately this means rethinking about your app, but that's the 
> only way about it. It will also help you scale smoothly into the 
> future if you have adjustable parallelism and more balanced data 
> processing.
>
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <tep@yahoo-inc.com 
> <ma...@yahoo-inc.com>> wrote:
>
>     Hi,
>        I'm getting the below error while trying to sort a lot of data with Hadoop.
>
>     I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>     to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
>     Thanks,
>        Tim.
>
>     2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>     2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>     2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>     	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>     	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>     	at java.security.AccessController.doPrivileged(Native Method)
>     	at javax.security.auth.Subject.doAs(Subject.java:415)
>     	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>     	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>     Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>     	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>     	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>     	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>     Caused by: java.io.IOException: No space left on device
>     	at java.io.FileOutputStream.writeBytes(Native Method)
>     	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>     	... 14 more
>
>     2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. 
> Thank You.

Re: No space left on device during merge.

Posted by Tim Potter <te...@yahoo-inc.com>.

Thanks for your reply Vinod.    I've been thinking about partitioning 
the data to having multiple reducers each one working on a contiguous 
part of the sort space. The problems is the keys are a combination of 
URLs and RDF BNodes. I can't see a way, without previously analysing the 
data, of partitioning the URLs equally in the sort space. Although I'm 
complete open to suggestions..   I guess I could analyse a sample of the 
data and build a partition function that works well on that, then apply 
it to the full data set.

I was hoping  there was a way of tuning how Hadoop sorts.

Regards,
   Tim.

On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
> That's a lot of data to process for a single reducer. You should try 
> increasing the number of reducers to achieve more parallelism and also 
> try modifying your logic to avoid significant skew in the reducers.
>
> Unfortunately this means rethinking about your app, but that's the 
> only way about it. It will also help you scale smoothly into the 
> future if you have adjustable parallelism and more balanced data 
> processing.
>
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <tep@yahoo-inc.com 
> <ma...@yahoo-inc.com>> wrote:
>
>     Hi,
>        I'm getting the below error while trying to sort a lot of data with Hadoop.
>
>     I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>     to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
>     Thanks,
>        Tim.
>
>     2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>     2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>     2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>     	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>     	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>     	at java.security.AccessController.doPrivileged(Native Method)
>     	at javax.security.auth.Subject.doAs(Subject.java:415)
>     	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>     	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>     Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>     	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>     	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>     	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>     Caused by: java.io.IOException: No space left on device
>     	at java.io.FileOutputStream.writeBytes(Native Method)
>     	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>     	... 14 more
>
>     2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. 
> Thank You.

Re: No space left on device during merge.

Posted by Tim Potter <te...@yahoo-inc.com>.

Thanks for your reply Vinod.    I've been thinking about partitioning 
the data to having multiple reducers each one working on a contiguous 
part of the sort space. The problems is the keys are a combination of 
URLs and RDF BNodes. I can't see a way, without previously analysing the 
data, of partitioning the URLs equally in the sort space. Although I'm 
complete open to suggestions..   I guess I could analyse a sample of the 
data and build a partition function that works well on that, then apply 
it to the full data set.

I was hoping  there was a way of tuning how Hadoop sorts.

Regards,
   Tim.

On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
> That's a lot of data to process for a single reducer. You should try 
> increasing the number of reducers to achieve more parallelism and also 
> try modifying your logic to avoid significant skew in the reducers.
>
> Unfortunately this means rethinking about your app, but that's the 
> only way about it. It will also help you scale smoothly into the 
> future if you have adjustable parallelism and more balanced data 
> processing.
>
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <tep@yahoo-inc.com 
> <ma...@yahoo-inc.com>> wrote:
>
>     Hi,
>        I'm getting the below error while trying to sort a lot of data with Hadoop.
>
>     I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
>     to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
>     Thanks,
>        Tim.
>
>     2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
>     2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
>     2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>     	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>     	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>     	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>     	at java.security.AccessController.doPrivileged(Native Method)
>     	at javax.security.auth.Subject.doAs(Subject.java:415)
>     	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>     	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>     Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>     	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>     	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>     	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>     	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>     	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>     	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>     	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>     	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>     Caused by: java.io.IOException: No space left on device
>     	at java.io.FileOutputStream.writeBytes(Native Method)
>     	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>     	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>     	... 14 more
>
>     2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. 
> Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:

>  Hi,
>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>
> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
> Thanks,
>   Tim.
>
> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
> 	... 14 more
>
> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:

>  Hi,
>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>
> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
> Thanks,
>   Tim.
>
> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
> 	... 14 more
>
> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:

>  Hi,
>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>
> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
> Thanks,
>   Tim.
>
> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
> 	... 14 more
>
> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: No space left on device during merge.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <te...@yahoo-inc.com> wrote:

>  Hi,
>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>
> I strongly suspect the node the merge is on is running out of local disk space. Assuming this is the case, is there any way
> to get around this limitation considering I can't increase the local disk space available on the nodes?  Like specify sort/merge parameters or similar.
>
> Thanks,
>   Tim.
>
> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 100 segments left of total size: 642610678884 bytes
> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
> 	... 14 more
>
> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.