You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jason Yang <li...@gmail.com> on 2012/09/24 15:25:46 UTC
Not able to place enough replicas in Reduce
Hi, all
I have been stuck by a weird problem for a long time. so I was wondering
could anyone give me some advise?
I have a MapReduce Job , in which:
1. the mapper would read a whole file as a split (by overriding
FileInputFormat.isSplitable ) and translate it into a set of points, and
then cluster these points into 4 groups randomly. The output of mapper is
<splitID_GroupID, point>. so If I have 20 input files, then there will be
20*4=80 unique keys in total.
2. I have implemented a KeyasNameMSOutputFormat which return file name
according to the key:
> public class KeyasNameMSFOutputFormat
> extends MultipleSequenceFileOutputFormat<Text, point>
> {
> protected String generateFileNameForKeyValue(Text key,
> Point value, String name)
> {
> return key.toString();
> }
> }
3. Use the IdentityReducer. and the jobConf is like:
> // set path for input and output
> ...
>
> // set format for input and output
> jobConf.setInputFormat(WholeFileInputFormat.class);
> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>
> // set class of output key and value
> jobConf.setOutputKeyClass(Text.class);
> jobConf.setOutputValueClass(Point.class);
>
> // set mapper and reducer
> jobConf.setMapperClass(LocalClusteringMapper.class);
> jobConf.setReducerClass(IdentityReducer.class);
>
> // run the job
> JobClient.runJob(jobConf);
On the pseudo-distributed hadoop cluster, this job runs fine with small
number of input files(e.g. 5 input files), but it would always failed in
the reduce phase if there are more than 20 input files.
I have checked the log, and find some strange logs:
*----JobTracker Log----*
2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_201209241945_0001_m_000019_0' has completed
task_201209241945_0001_m_000019 successfully.
2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
completedMapsUpdates:20 completedMapsInputSize:28649108
completedMapsOutputSize:41385220
2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201209241945_0001_r_000000_0:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
*----DataNode log----*
2012-09-24 19:49:08,156 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /127.0.0.1:50010
2012-09-24 19:49:08,223 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
2012-09-24 19:49:08,324 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
2012-09-24 19:49:08,389 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
2012-09-24 19:49:08,465 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /127.0.0.1:50010
2012-09-24 19:49:09,850 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_5143138923881025306_16019
2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder 0 for block blk_5143138923881025306_16019 terminating
2012-09-24 19:49:09,941 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_1512576479762656406_16020
2012-09-24 19:49:09,944 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
block blk_1512576479762656406_16020 terminating
2012-09-24 19:49:09,975 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_8640569083674549899_16021
*----NameNode Log----*
2012-09-24 19:49:09,836 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare ip=/
127.0.0.1 cmd=create
src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
dst=null perm=yanglin:supergroup:rw-r--r--
2012-09-24 19:49:09,845
WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to
place
enough replicas, still in need of 1
2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
DFSClient_attempt_201209241945_0001_r_000000_0) from
127.0.0.1:37972:error: java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
blk_5143138923881025306_16019 size 513321
2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
is closed by DFSClient_attempt_201209241945_0001_r_000000_0
*----taskTracker log----*
2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
1873109 bytes for reduce: 0 from map: attempt_201209241945_0001_m_000018_0
given 1873109/1873105
2012-09-24 19:49:02,452 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
attempt_201209241945_0001_m_000018_0
2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
1328490 bytes for reduce: 0 from map: attempt_201209241945_0001_m_000019_0
given 1328490/1328486
2012-09-24 19:49:02,469 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
attempt_201209241945_0001_m_000019_0
2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201209241945_0001_r_000000_0 done; removing files.
2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
task's state:UNASSIGNED
2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_201209241945_0001_r_000000_1
In case it concern, here is my developing environment:
Hadoop Version: 0.20.2
OS: Ubuntu 12.04 LTS
*yanglin@ubuntu:~$ ulimit -a*
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 32059
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10240
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 20480
cpu time (seconds, -t) unlimited
max user processes (-u) 32059
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hey, Bertrand
here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.
*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD> : 0
------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive
and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
</value>
</property>
</configuration>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost
-----
2012/9/24 Bertrand Dechoux <de...@gmail.com>
> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node. I generally means
>>>> that your DataNode is either down or not functioning correctly. What is
>>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hey, Bertrand
here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.
*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD> : 0
------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive
and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
</value>
</property>
</configuration>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost
-----
2012/9/24 Bertrand Dechoux <de...@gmail.com>
> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node. I generally means
>>>> that your DataNode is either down or not functioning correctly. What is
>>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hey, Bertrand
here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.
*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD> : 0
------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive
and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
</value>
</property>
</configuration>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost
-----
2012/9/24 Bertrand Dechoux <de...@gmail.com>
> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node. I generally means
>>>> that your DataNode is either down or not functioning correctly. What is
>>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hey, Bertrand
here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.
*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD> : 0
------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive
and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
</value>
</property>
</configuration>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost
-----
2012/9/24 Bertrand Dechoux <de...@gmail.com>
> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node. I generally means
>>>> that your DataNode is either down or not functioning correctly. What is
>>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Bertrand Dechoux <de...@gmail.com>.
And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?
http://localhost:50070/dfshealth.jsp
As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.
Regards
Bertrand
On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node. I generally means
>>> that your DataNode is either down or not functioning correctly. What is
>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>
--
Bertrand Dechoux
Re: Not able to place enough replicas in Reduce
Posted by Bertrand Dechoux <de...@gmail.com>.
And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?
http://localhost:50070/dfshealth.jsp
As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.
Regards
Bertrand
On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node. I generally means
>>> that your DataNode is either down or not functioning correctly. What is
>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>
--
Bertrand Dechoux
Re: Not able to place enough replicas in Reduce
Posted by Bertrand Dechoux <de...@gmail.com>.
And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?
http://localhost:50070/dfshealth.jsp
As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.
Regards
Bertrand
On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node. I generally means
>>> that your DataNode is either down or not functioning correctly. What is
>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>
--
Bertrand Dechoux
Re: Not able to place enough replicas in Reduce
Posted by Bertrand Dechoux <de...@gmail.com>.
And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?
http://localhost:50070/dfshealth.jsp
As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.
Regards
Bertrand
On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node. I generally means
>>> that your DataNode is either down or not functioning correctly. What is
>>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>
--
Bertrand Dechoux
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hi, Ted
here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.
And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data
2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node. I generally means
>> that your DataNode is either down or not functioning correctly. What is
>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hi, Ted
here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.
And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data
2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node. I generally means
>> that your DataNode is either down or not functioning correctly. What is
>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hi, Ted
here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.
And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data
2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node. I generally means
>> that your DataNode is either down or not functioning correctly. What is
>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Jason Yang <li...@gmail.com>.
Hi, Ted
here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.
And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data
2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node. I generally means
>> that your DataNode is either down or not functioning correctly. What is
>> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>
--
YANG, Lin
Re: Not able to place enough replicas in Reduce
Posted by Steve Loughran <st...@hortonworks.com>.
On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node. I generally means
> that your DataNode is either down or not functioning correctly. What is
> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>
>
see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
-steve
Re: Not able to place enough replicas in Reduce
Posted by Steve Loughran <st...@hortonworks.com>.
On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node. I generally means
> that your DataNode is either down or not functioning correctly. What is
> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>
>
see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
-steve
Re: Not able to place enough replicas in Reduce
Posted by Steve Loughran <st...@hortonworks.com>.
On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node. I generally means
> that your DataNode is either down or not functioning correctly. What is
> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>
>
see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
-steve
Re: Not able to place enough replicas in Reduce
Posted by Steve Loughran <st...@hortonworks.com>.
On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node. I generally means
> that your DataNode is either down or not functioning correctly. What is
> the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
>
>
see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
-steve
Re: Not able to place enough replicas in Reduce
Posted by Ted Reynolds <te...@hortonworks.com>.
Jason,
The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node. I generally means
that your DataNode is either down or not functioning correctly. What is
the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079
<http://hortonworks.com/download/>
On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:
> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>> Point value, String name)
>> {
>> return key.toString();
>> }
>> }
>
>
> 3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20 completedMapsInputSize:28649108
> completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 32059
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 10240
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 20480
> cpu time (seconds, -t) unlimited
> max user processes (-u) 32059
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
> --
> YANG, Lin
>
>
Re: Not able to place enough replicas in Reduce
Posted by Ted Reynolds <te...@hortonworks.com>.
Jason,
The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node. I generally means
that your DataNode is either down or not functioning correctly. What is
the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079
<http://hortonworks.com/download/>
On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:
> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>> Point value, String name)
>> {
>> return key.toString();
>> }
>> }
>
>
> 3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20 completedMapsInputSize:28649108
> completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 32059
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 10240
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 20480
> cpu time (seconds, -t) unlimited
> max user processes (-u) 32059
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
> --
> YANG, Lin
>
>
Re: Not able to place enough replicas in Reduce
Posted by Ted Reynolds <te...@hortonworks.com>.
Jason,
The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node. I generally means
that your DataNode is either down or not functioning correctly. What is
the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079
<http://hortonworks.com/download/>
On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:
> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>> Point value, String name)
>> {
>> return key.toString();
>> }
>> }
>
>
> 3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20 completedMapsInputSize:28649108
> completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 32059
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 10240
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 20480
> cpu time (seconds, -t) unlimited
> max user processes (-u) 32059
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
> --
> YANG, Lin
>
>
Re: Not able to place enough replicas in Reduce
Posted by Ted Reynolds <te...@hortonworks.com>.
Jason,
The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node. I generally means
that your DataNode is either down or not functioning correctly. What is
the output of the "jps" command? ("jps" is found in <JAVA_HOME>/bin).
Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079
<http://hortonworks.com/download/>
On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:
> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>> Point value, String name)
>> {
>> return key.toString();
>> }
>> }
>
>
> 3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20 completedMapsInputSize:28649108
> completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 32059
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 10240
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 20480
> cpu time (seconds, -t) unlimited
> max user processes (-u) 32059
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
> --
> YANG, Lin
>
>