You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jun Young Kim <ju...@gmail.com> on 2011/02/21 02:17:54 UTC

how many output files can support by MultipleOutputs?

hi,

in an application, I read many files in many directories.
additionally, by using MultipleOutputs class, I try to write thousands 
of output files in many directories.

during reduce processing(reduce task count is 1),
almost my job(average job counts in parallel are 20) are failed.

almost error types are like

java.io.IOException: Bad connect ack with firstBadLink as 
10.25.241.101:50010 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 



java.io.EOFException at 
java.io.DataInputStream.readShort(DataInputStream.java:298) at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 



org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error 
while doing final merge at 
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) 
at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
any valid local directory for output/map_869.out at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) 
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) 
at 
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182) 
at org.apache.hadoop.mapreduce.task.reduce.MergeMa


currenly, I suspect this is caused by limitations of hadoop to support 
output file descriptor count.
(I am using a linux server to support this job, server configuration is

$> cat /proc/sys/fs/file-max
327680

-- 
Junyoung Kim (juneng603@gmail.com)

Re: how many output files can support by MultipleOutputs?

Posted by Harsh J <qw...@gmail.com>.

Hey,

On Mon, Feb 21, 2011 at 8:27 AM, Jun Young Kim <ju...@gmail.com> wrote:
> Hi, harsh
>
> I thought all my configuration to run a hadoop are listed in a job
> configuration.
>
> even if user didn't set properties explicitly, a hadoop set it defaultly.
>
> that means all properties should be listed in a job configuration.
>
> isn't it right?

Oh yes. I believe I wasn't clear. What I meant was that setting that
property by code (JobConf.set, -D, etc.) will not affect the xciever
amount that the DN was started with.

All properties loaded by the Configuration class from its sources are
listed in the job.xml, yes (but since xciever isn't present in any of
the *-default.xml files, it won't be listed unless explicitly added to
*-site.xml).

-- 
Harsh J
www.harshj.com

Re: how many output files can support by MultipleOutputs?

Posted by Jun Young Kim <ju...@gmail.com>.

Hi, harsh

I thought all my configuration to run a hadoop are listed in a job 
configuration.

even if user didn't set properties explicitly, a hadoop set it defaultly.

that means all properties should be listed in a job configuration.

isn't it right?

Junyoung Kim (juneng603@gmail.com)


On 02/21/2011 11:40 AM, Harsh J wrote:
> Hello,
>
> On Mon, Feb 21, 2011 at 8:01 AM, Jun Young Kim<ju...@gmail.com>  wrote:
>> now, I am using a hadoop version 0.20.0.
>>
>> I have one more question about this configuration.
>>
>> before setting "dfs.datanode.max.xcievers", I couldn't find out this one in
>> job.xml.
> That is because the property does not exist in the hdfs-default.xml
> file, present in hadoop's jars. I don't know the reason behind that
> (since it is unavailable as a default inside 0.21 either).
>
> Also, it is a DN property, not a Job-specific one (can't be changed).
> Setting it into hdfs-site.xml should be sufficient.
>

Re: how many output files can support by MultipleOutputs?

Posted by Harsh J <qw...@gmail.com>.

Hello,

On Mon, Feb 21, 2011 at 8:01 AM, Jun Young Kim <ju...@gmail.com> wrote:
> now, I am using a hadoop version 0.20.0.
>
> I have one more question about this configuration.
>
> before setting "dfs.datanode.max.xcievers", I couldn't find out this one in
> job.xml.

That is because the property does not exist in the hdfs-default.xml
file, present in hadoop's jars. I don't know the reason behind that
(since it is unavailable as a default inside 0.21 either).

Also, it is a DN property, not a Job-specific one (can't be changed).
Setting it into hdfs-site.xml should be sufficient.

-- 
Harsh J
www.harshj.com

Re: how many output files can support by MultipleOutputs?

Posted by Jun Young Kim <ju...@gmail.com>.

now, I am using a hadoop version 0.20.0.

I have one more question about this configuration.

before setting "dfs.datanode.max.xcievers", I couldn't find out this one 
in job.xml.

is this hidden configuration?
why I could find out this one in my job.xml?

thanks.

Junyoung Kim (juneng603@gmail.com)


On 02/21/2011 10:47 AM, Yifeng Jiang wrote:
> We were using 0.20.2 when the issue occurred, then we set it to 2048, 
> and the failure was fixed.
> Now we are using 0.20-append (HBase requires it), it works well too.
>
> On 2011/02/21 10:35, Jun Young Kim wrote:
>> hi, yifeng.
>>
>> Coung I know which version of a hadoop you are using?
>>
>> thanks for your response.
>>
>> Junyoung Kim (juneng603@gmail.com)
>>
>>
>> On 02/21/2011 10:28 AM, Yifeng Jiang wrote:
>>> Hi,
>>>
>>> We have met the same issue.
>>> It seems that this error occurs, when the threads connected to the 
>>> Datanode reaches the maximum # of server threads, defined by 
>>> "dfs.datanode.max.xcievers" in hdfs-site.xml
>>> Our solution is to increase the it from the default value (256) to a 
>>> bigger one, such as 2048.
>>>
>>> On 2011/02/21 10:17, Jun Young Kim wrote:
>>>> hi,
>>>>
>>>> in an application, I read many files in many directories.
>>>> additionally, by using MultipleOutputs class, I try to write 
>>>> thousands of output files in many directories.
>>>>
>>>> during reduce processing(reduce task count is 1),
>>>> almost my job(average job counts in parallel are 20) are failed.
>>>>
>>>> almost error types are like
>>>>
>>>> java.io.IOException: Bad connect ack with firstBadLink as 
>>>> 10.25.241.101:50010 at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>>>
>>>>
>>>>
>>>> java.io.EOFException at 
>>>> java.io.DataInputStream.readShort(DataInputStream.java:298) at 
>>>> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>>>
>>>>
>>>>
>>>> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error 
>>>> while doing final merge at 
>>>> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) 
>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
>>>> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
>>>> java.security.AccessController.doPrivileged(Native Method) at 
>>>> javax.security.auth.Subject.doAs(Subject.java:396) at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) 
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not 
>>>> find any valid local directory for output/map_869.out at 
>>>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) 
>>>> at 
>>>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) 
>>>> at 
>>>> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182) 
>>>> at org.apache.hadoop.mapreduce.task.reduce.MergeMa
>>>>
>>>>
>>>> currenly, I suspect this is caused by limitations of hadoop to 
>>>> support output file descriptor count.
>>>> (I am using a linux server to support this job, server 
>>>> configuration is
>>>>
>>>> $> cat /proc/sys/fs/file-max
>>>> 327680
>>>>
>>>
>>>
>>
>
>

Re: how many output files can support by MultipleOutputs?

Posted by Jun Young Kim <ju...@gmail.com>.

hi,

I think the third error pattern is are not caused by xceiver key.

org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#5
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
	at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
	at org.apache.hadoop.mapreduce.task.reduce.MapOutput.<init>(MapOutput.java:104)
	at org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:267)
	at org.apache.hadoop.mapreduce.task.re


by the google, this is by wrong ip entires which is  the one of my cluster.
but, I've checked several times again. ip addresses of my cluster are 
normal.

my cluster size is 9 (1 master, 8 slaves)

this is my mapred-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.job.tracker</name>
<value>thadpm01.scast:54311</value>
<description>The host and port that the MapReduce job tracker runs
   at.  If "local", then jobs are run in-process as a single map
   and reduce task.
</description>
</property>
<property>
<name>mapreduce.jobtracker.taskscheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
<name>mapreduce.child.java.opts</name>
<value>-Xmx1024m</value>
<final>true</final>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m</value>
<final>true</final>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1024m</value>
<final>true</final>
</property>
<property>
<name>mapreduce.tasktracker.map.tasks.maximum</name>
<value>83</value>
<final>true</final>
</property>
<property>
<name>mapreduce.tasktracker.reduce.tasks.maximum</name>
<value>11</value>
<final>true</final>
</property>
<property>
<property>
<name>mapreduce.jobtracker.handler.count</name>
<value>20</value>
<final>true</final>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>10</value>
<final>true</final>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
<final>true</final>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>400</value>
<final>true</final>
</property>
</configuration>

error log on stdout:
attempt_201102181827_0113_r_000000_1: 2011-02-22 10:24:28[WARN 
][Child.java]main()(234) : Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in fetcher#8
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapred.Child$4.run(Child.java:217)
attempt_201102181827_0113_r_000000_1:   at 
java.security.AccessController.doPrivileged(Native Method)
attempt_201102181827_0113_r_000000_1:   at 
javax.security.auth.Subject.doAs(Subject.java:396)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapred.Child.main(Child.java:211)
attempt_201102181827_0113_r_000000_1: Caused by: 
java.lang.OutOfMemoryError: Java heap space
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.MapOutput.<init>(MapOutput.java:104)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:267)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:257)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:305)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:251)
attempt_201102181827_0113_r_000000_1:   at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
attempt_201102181827_0113_r_000000_1: 2011-02-22 10:24:28[INFO 
][Task.java]taskCleanup()(996) : Runnning cleanup for the task
11/02/22 10:24:44 INFO mapreduce.Job:  map 21% reduce 0%
11/02/22 10:24:54 INFO mapreduce.Job:  map 22% reduce 0%


thanks.

Junyoung Kim (juneng603@gmail.com)


On 02/21/2011 10:47 AM, Yifeng Jiang wrote:
> We were using 0.20.2 when the issue occurred, then we set it to 2048, 
> and the failure was fixed.
> Now we are using 0.20-append (HBase requires it), it works well too.
>
> On 2011/02/21 10:35, Jun Young Kim wrote:
>> hi, yifeng.
>>
>> Coung I know which version of a hadoop you are using?
>>
>> thanks for your response.
>>
>> Junyoung Kim (juneng603@gmail.com)
>>
>>
>> On 02/21/2011 10:28 AM, Yifeng Jiang wrote:
>>> Hi,
>>>
>>> We have met the same issue.
>>> It seems that this error occurs, when the threads connected to the 
>>> Datanode reaches the maximum # of server threads, defined by 
>>> "dfs.datanode.max.xcievers" in hdfs-site.xml
>>> Our solution is to increase the it from the default value (256) to a 
>>> bigger one, such as 2048.
>>>
>>> On 2011/02/21 10:17, Jun Young Kim wrote:
>>>> hi,
>>>>
>>>> in an application, I read many files in many directories.
>>>> additionally, by using MultipleOutputs class, I try to write 
>>>> thousands of output files in many directories.
>>>>
>>>> during reduce processing(reduce task count is 1),
>>>> almost my job(average job counts in parallel are 20) are failed.
>>>>
>>>> almost error types are like
>>>>
>>>> java.io.IOException: Bad connect ack with firstBadLink as 
>>>> 10.25.241.101:50010 at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>>>
>>>>
>>>>
>>>> java.io.EOFException at 
>>>> java.io.DataInputStream.readShort(DataInputStream.java:298) at 
>>>> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>>>> at 
>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>>>
>>>>
>>>>
>>>> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error 
>>>> while doing final merge at 
>>>> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) 
>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
>>>> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
>>>> java.security.AccessController.doPrivileged(Native Method) at 
>>>> javax.security.auth.Subject.doAs(Subject.java:396) at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) 
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not 
>>>> find any valid local directory for output/map_869.out at 
>>>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) 
>>>> at 
>>>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) 
>>>> at 
>>>> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182) 
>>>> at org.apache.hadoop.mapreduce.task.reduce.MergeMa
>>>>
>>>>
>>>> currenly, I suspect this is caused by limitations of hadoop to 
>>>> support output file descriptor count.
>>>> (I am using a linux server to support this job, server 
>>>> configuration is
>>>>
>>>> $> cat /proc/sys/fs/file-max
>>>> 327680
>>>>
>>>
>>>
>>
>
>

Re: how many output files can support by MultipleOutputs?

Posted by Yifeng Jiang <yi...@mail.rakuten.co.jp>.

We were using 0.20.2 when the issue occurred, then we set it to 2048, 
and the failure was fixed.
Now we are using 0.20-append (HBase requires it), it works well too.

On 2011/02/21 10:35, Jun Young Kim wrote:
> hi, yifeng.
>
> Coung I know which version of a hadoop you are using?
>
> thanks for your response.
>
> Junyoung Kim (juneng603@gmail.com)
>
>
> On 02/21/2011 10:28 AM, Yifeng Jiang wrote:
>> Hi,
>>
>> We have met the same issue.
>> It seems that this error occurs, when the threads connected to the 
>> Datanode reaches the maximum # of server threads, defined by 
>> "dfs.datanode.max.xcievers" in hdfs-site.xml
>> Our solution is to increase the it from the default value (256) to a 
>> bigger one, such as 2048.
>>
>> On 2011/02/21 10:17, Jun Young Kim wrote:
>>> hi,
>>>
>>> in an application, I read many files in many directories.
>>> additionally, by using MultipleOutputs class, I try to write 
>>> thousands of output files in many directories.
>>>
>>> during reduce processing(reduce task count is 1),
>>> almost my job(average job counts in parallel are 20) are failed.
>>>
>>> almost error types are like
>>>
>>> java.io.IOException: Bad connect ack with firstBadLink as 
>>> 10.25.241.101:50010 at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) 
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>>
>>>
>>>
>>> java.io.EOFException at 
>>> java.io.DataInputStream.readShort(DataInputStream.java:298) at 
>>> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113) 
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881) 
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>>> at 
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>>
>>>
>>>
>>> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error 
>>> while doing final merge at 
>>> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) at 
>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
>>> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
>>> java.security.AccessController.doPrivileged(Native Method) at 
>>> javax.security.auth.Subject.doAs(Subject.java:396) at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) 
>>> at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not 
>>> find any valid local directory for output/map_869.out at 
>>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) 
>>> at 
>>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) 
>>> at 
>>> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182) 
>>> at org.apache.hadoop.mapreduce.task.reduce.MergeMa
>>>
>>>
>>> currenly, I suspect this is caused by limitations of hadoop to 
>>> support output file descriptor count.
>>> (I am using a linux server to support this job, server configuration is
>>>
>>> $> cat /proc/sys/fs/file-max
>>> 327680
>>>
>>
>>
>


-- 
Yifeng Jiang

Re: how many output files can support by MultipleOutputs?

Posted by Jun Young Kim <ju...@gmail.com>.

hi, yifeng.

Coung I know which version of a hadoop you are using?

thanks for your response.

Junyoung Kim (juneng603@gmail.com)


On 02/21/2011 10:28 AM, Yifeng Jiang wrote:
> Hi,
>
> We have met the same issue.
> It seems that this error occurs, when the threads connected to the 
> Datanode reaches the maximum # of server threads, defined by 
> "dfs.datanode.max.xcievers" in hdfs-site.xml
> Our solution is to increase the it from the default value (256) to a 
> bigger one, such as 2048.
>
> On 2011/02/21 10:17, Jun Young Kim wrote:
>> hi,
>>
>> in an application, I read many files in many directories.
>> additionally, by using MultipleOutputs class, I try to write 
>> thousands of output files in many directories.
>>
>> during reduce processing(reduce task count is 1),
>> almost my job(average job counts in parallel are 20) are failed.
>>
>> almost error types are like
>>
>> java.io.IOException: Bad connect ack with firstBadLink as 
>> 10.25.241.101:50010 at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) 
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>
>>
>>
>> java.io.EOFException at 
>> java.io.DataInputStream.readShort(DataInputStream.java:298) at 
>> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113) 
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881) 
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
>> at 
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>>
>>
>>
>> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error 
>> while doing final merge at 
>> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) 
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
>> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
>> java.security.AccessController.doPrivileged(Native Method) at 
>> javax.security.auth.Subject.doAs(Subject.java:396) at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) 
>> at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
>> any valid local directory for output/map_869.out at 
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) 
>> at 
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) 
>> at 
>> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182) 
>> at org.apache.hadoop.mapreduce.task.reduce.MergeMa
>>
>>
>> currenly, I suspect this is caused by limitations of hadoop to 
>> support output file descriptor count.
>> (I am using a linux server to support this job, server configuration is
>>
>> $> cat /proc/sys/fs/file-max
>> 327680
>>
>
>

Re: how many output files can support by MultipleOutputs?

Posted by Yifeng Jiang <yi...@mail.rakuten.co.jp>.

Hi,

We have met the same issue.
It seems that this error occurs, when the threads connected to the 
Datanode reaches the maximum # of server threads, defined by 
"dfs.datanode.max.xcievers" in hdfs-site.xml
Our solution is to increase the it from the default value (256) to a 
bigger one, such as 2048.

On 2011/02/21 10:17, Jun Young Kim wrote:
> hi,
>
> in an application, I read many files in many directories.
> additionally, by using MultipleOutputs class, I try to write thousands 
> of output files in many directories.
>
> during reduce processing(reduce task count is 1),
> almost my job(average job counts in parallel are 20) are failed.
>
> almost error types are like
>
> java.io.IOException: Bad connect ack with firstBadLink as 
> 10.25.241.101:50010 at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>
>
>
> java.io.EOFException at 
> java.io.DataInputStream.readShort(DataInputStream.java:298) at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820) 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427) 
>
>
>
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error 
> while doing final merge at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) 
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) 
> at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> any valid local directory for output/map_869.out at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) 
> at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) 
> at 
> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182) 
> at org.apache.hadoop.mapreduce.task.reduce.MergeMa
>
>
> currenly, I suspect this is caused by limitations of hadoop to support 
> output file descriptor count.
> (I am using a linux server to support this job, server configuration is
>
> $> cat /proc/sys/fs/file-max
> 327680
>


-- 
Yifeng Jiang