You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by ma qiang <ma...@gmail.com> on 2009/02/25 03:58:23 UTC

why print this error when using MultipleOutputFormat?

Hi all,
   I have one class extends MultipleOutputFormat as below,

      public class MyMultipleTextOutputFormat<K, V> extends
MultipleOutputFormat<K, V> {
	private TextOutputFormat<K, V> theTextOutputFormat = null;

	@Override
	protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
			JobConf job, String name, Progressable arg3) throws IOException {
		if (theTextOutputFormat == null) {
			theTextOutputFormat = new TextOutputFormat<K, V>();
		}
		return theTextOutputFormat.getRecordWriter(fs, job, name, arg3);
	}
	@Override
	protected String generateFileNameForKeyValue(K key, V value, String name) {
		return name + "_" + key.toString();
	}
}


also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
configuration. but when the program run, error print as follow:

09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
attempt_200902250959_0002_r_000001_0, Status : FAILED
java.io.IOException: Could not read from stream
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
	at java.io.DataInputStream.readByte(DataInputStream.java:248)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
attempt_200902250959_0002_r_000000_1, Status : FAILED
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
could only be replicated to 0 nodes, instead of 1
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
	at org.apache.hadoop.ipc.Client.call(Client.java:696)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
	at $Proxy1.addBlock(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy1.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)


Of course the program run successfully without MyMultipleOutputFormat.
who can help me solve this problem?
Thanks.

yours,    Qiang

Re: why print this error when using MultipleOutputFormat?

Posted by jason hadoop <ja...@gmail.com>.

Search for /etc/security/limits.conf in the list, there are instructions for
checking the file descriptor limits in those posts, for linux machines


On Wed, Feb 25, 2009 at 4:28 AM, Rasit OZDAS <ra...@gmail.com> wrote:

> Qiang,
> I couldn't find now which one, but there is a JIRA issue about
> MultipleTextOutputFormat (especially when reducers = 0).
> If you have no reducers, you can try having one or two, then you can see if
> your problem is related with this one.
>
> Cheers,
> Rasit
>
> 2009/2/25 ma qiang <ma...@gmail.com>
>
> > Thanks for your reply.
> > If I increase the number of computers, can we solve this problem of
> > running out of file descriptors?
> >
> >
> >
> >
> > On Wed, Feb 25, 2009 at 11:07 AM, jason hadoop <ja...@gmail.com>
> > wrote:
> > > My 1st guess is that your application is running out of file
> > > descriptors,possibly because your MultipleOutputFormat  instance is
> > opening
> > > more output files than you expect.
> > > Opening lots of files in HDFS is generally a quick route to bad job
> > > performance if not job failure.
> > >
> > > On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <ma...@gmail.com>
> wrote:
> > >
> > >> Hi all,
> > >>   I have one class extends MultipleOutputFormat as below,
> > >>
> > >>      public class MyMultipleTextOutputFormat<K, V> extends
> > >> MultipleOutputFormat<K, V> {
> > >>        private TextOutputFormat<K, V> theTextOutputFormat = null;
> > >>
> > >>        @Override
> > >>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
> > >>                        JobConf job, String name, Progressable arg3)
> > throws
> > >> IOException {
> > >>                if (theTextOutputFormat == null) {
> > >>                        theTextOutputFormat = new TextOutputFormat<K,
> > V>();
> > >>                }
> > >>                return theTextOutputFormat.getRecordWriter(fs, job,
> name,
> > >> arg3);
> > >>        }
> > >>        @Override
> > >>        protected String generateFileNameForKeyValue(K key, V value,
> > String
> > >> name) {
> > >>                return name + "_" + key.toString();
> > >>        }
> > >> }
> > >>
> > >>
> > >> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
> > >> configuration. but when the program run, error print as follow:
> > >>
> > >> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
> > >> attempt_200902250959_0002_r_000001_0, Status : FAILED
> > >> java.io.IOException: Could not read from stream
> > >>        at
> > >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
> > >>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
> > >>        at
> > >> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
> > >>        at
> > >> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
> > >>        at org.apache.hadoop.io.Text.readString(Text.java:400)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> > >>
> > >> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
> > >> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
> > >> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
> > >> attempt_200902250959_0002_r_000000_1, Status : FAILED
> > >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> > >>
> > >>
> >
> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
> > >> could only be replicated to 0 nodes, instead of 1
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> > >>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> > >>        at
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>        at java.lang.reflect.Method.invoke(Method.java:597)
> > >>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> > >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
> > >>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
> > >>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> > >>        at $Proxy1.addBlock(Unknown Source)
> > >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>        at
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>        at
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>        at java.lang.reflect.Method.invoke(Method.java:597)
> > >>        at
> > >>
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> > >>        at
> > >>
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> > >>        at $Proxy1.addBlock(Unknown Source)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> > >>        at
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> > >>
> > >>
> > >> Of course the program run successfully without MyMultipleOutputFormat.
> > >> who can help me solve this problem?
> > >> Thanks.
> > >>
> > >> yours,    Qiang
> > >>
> > >
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>

Re: why print this error when using MultipleOutputFormat?

Posted by Rasit OZDAS <ra...@gmail.com>.

Qiang,
I couldn't find now which one, but there is a JIRA issue about
MultipleTextOutputFormat (especially when reducers = 0).
If you have no reducers, you can try having one or two, then you can see if
your problem is related with this one.

Cheers,
Rasit

2009/2/25 ma qiang <ma...@gmail.com>

> Thanks for your reply.
> If I increase the number of computers, can we solve this problem of
> running out of file descriptors?
>
>
>
>
> On Wed, Feb 25, 2009 at 11:07 AM, jason hadoop <ja...@gmail.com>
> wrote:
> > My 1st guess is that your application is running out of file
> > descriptors,possibly because your MultipleOutputFormat  instance is
> opening
> > more output files than you expect.
> > Opening lots of files in HDFS is generally a quick route to bad job
> > performance if not job failure.
> >
> > On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <ma...@gmail.com> wrote:
> >
> >> Hi all,
> >>   I have one class extends MultipleOutputFormat as below,
> >>
> >>      public class MyMultipleTextOutputFormat<K, V> extends
> >> MultipleOutputFormat<K, V> {
> >>        private TextOutputFormat<K, V> theTextOutputFormat = null;
> >>
> >>        @Override
> >>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
> >>                        JobConf job, String name, Progressable arg3)
> throws
> >> IOException {
> >>                if (theTextOutputFormat == null) {
> >>                        theTextOutputFormat = new TextOutputFormat<K,
> V>();
> >>                }
> >>                return theTextOutputFormat.getRecordWriter(fs, job, name,
> >> arg3);
> >>        }
> >>        @Override
> >>        protected String generateFileNameForKeyValue(K key, V value,
> String
> >> name) {
> >>                return name + "_" + key.toString();
> >>        }
> >> }
> >>
> >>
> >> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
> >> configuration. but when the program run, error print as follow:
> >>
> >> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
> >> attempt_200902250959_0002_r_000001_0, Status : FAILED
> >> java.io.IOException: Could not read from stream
> >>        at
> >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
> >>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
> >>        at
> >> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
> >>        at
> >> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
> >>        at org.apache.hadoop.io.Text.readString(Text.java:400)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> >>
> >> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
> >> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
> >> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
> >> attempt_200902250959_0002_r_000000_1, Status : FAILED
> >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> >>
> >>
> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
> >> could only be replicated to 0 nodes, instead of 1
> >>        at
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
> >>        at
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> >>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> >>        at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
> >>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
> >>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> >>        at $Proxy1.addBlock(Unknown Source)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>        at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>        at
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >>        at
> >>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >>        at $Proxy1.addBlock(Unknown Source)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> >>        at
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> >>
> >>
> >> Of course the program run successfully without MyMultipleOutputFormat.
> >> who can help me solve this problem?
> >> Thanks.
> >>
> >> yours,    Qiang
> >>
> >
>



-- 
M. Raşit ÖZDAŞ

Re: why print this error when using MultipleOutputFormat?

Posted by ma qiang <ma...@gmail.com>.

Thanks for your reply.
If I increase the number of computers, can we solve this problem of
running out of file descriptors?




On Wed, Feb 25, 2009 at 11:07 AM, jason hadoop <ja...@gmail.com> wrote:
> My 1st guess is that your application is running out of file
> descriptors,possibly because your MultipleOutputFormat  instance is opening
> more output files than you expect.
> Opening lots of files in HDFS is generally a quick route to bad job
> performance if not job failure.
>
> On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <ma...@gmail.com> wrote:
>
>> Hi all,
>>   I have one class extends MultipleOutputFormat as below,
>>
>>      public class MyMultipleTextOutputFormat<K, V> extends
>> MultipleOutputFormat<K, V> {
>>        private TextOutputFormat<K, V> theTextOutputFormat = null;
>>
>>        @Override
>>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
>>                        JobConf job, String name, Progressable arg3) throws
>> IOException {
>>                if (theTextOutputFormat == null) {
>>                        theTextOutputFormat = new TextOutputFormat<K, V>();
>>                }
>>                return theTextOutputFormat.getRecordWriter(fs, job, name,
>> arg3);
>>        }
>>        @Override
>>        protected String generateFileNameForKeyValue(K key, V value, String
>> name) {
>>                return name + "_" + key.toString();
>>        }
>> }
>>
>>
>> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
>> configuration. but when the program run, error print as follow:
>>
>> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
>> attempt_200902250959_0002_r_000001_0, Status : FAILED
>> java.io.IOException: Could not read from stream
>>        at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
>>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
>>        at
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
>>        at
>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
>>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>
>> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
>> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
>> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
>> attempt_200902250959_0002_r_000000_1, Status : FAILED
>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>>
>> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
>> could only be replicated to 0 nodes, instead of 1
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
>>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
>>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>        at $Proxy1.addBlock(Unknown Source)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>        at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>        at $Proxy1.addBlock(Unknown Source)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>>        at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>
>>
>> Of course the program run successfully without MyMultipleOutputFormat.
>> who can help me solve this problem?
>> Thanks.
>>
>> yours,    Qiang
>>
>

Re: why print this error when using MultipleOutputFormat?

Posted by jason hadoop <ja...@gmail.com>.

My 1st guess is that your application is running out of file
descriptors,possibly because your MultipleOutputFormat  instance is opening
more output files than you expect.
Opening lots of files in HDFS is generally a quick route to bad job
performance if not job failure.

On Tue, Feb 24, 2009 at 6:58 PM, ma qiang <ma...@gmail.com> wrote:

> Hi all,
>   I have one class extends MultipleOutputFormat as below,
>
>      public class MyMultipleTextOutputFormat<K, V> extends
> MultipleOutputFormat<K, V> {
>        private TextOutputFormat<K, V> theTextOutputFormat = null;
>
>        @Override
>        protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
>                        JobConf job, String name, Progressable arg3) throws
> IOException {
>                if (theTextOutputFormat == null) {
>                        theTextOutputFormat = new TextOutputFormat<K, V>();
>                }
>                return theTextOutputFormat.getRecordWriter(fs, job, name,
> arg3);
>        }
>        @Override
>        protected String generateFileNameForKeyValue(K key, V value, String
> name) {
>                return name + "_" + key.toString();
>        }
> }
>
>
> also conf.setOutputFormat(MultipleTextOutputFormat2.class) in my job
> configuration. but when the program run, error print as follow:
>
> 09/02/25 10:22:32 INFO mapred.JobClient: Task Id :
> attempt_200902250959_0002_r_000001_0, Status : FAILED
> java.io.IOException: Could not read from stream
>        at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:119)
>        at java.io.DataInputStream.readByte(DataInputStream.java:248)
>        at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:325)
>        at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2779)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2704)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>
> 09/02/25 10:22:42 INFO mapred.JobClient:  map 100% reduce 69%
> 09/02/25 10:22:55 INFO mapred.JobClient:  map 100% reduce 0%
> 09/02/25 10:22:55 INFO mapred.JobClient: Task Id :
> attempt_200902250959_0002_r_000000_1, Status : FAILED
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>
> /user/qiang/output/_temporary/_attempt_200902250959_0002_r_000000_1/part-00000_t0x5y3
> could only be replicated to 0 nodes, instead of 1
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
>        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
>        at org.apache.hadoop.ipc.Client.call(Client.java:696)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy1.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>
>
> Of course the program run successfully without MyMultipleOutputFormat.
> who can help me solve this problem?
> Thanks.
>
> yours,    Qiang
>