You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by yaotian <ya...@gmail.com> on 2013/01/18 04:17:44 UTC
Help. Strange thing. It's block me 1 week....
Hi,
*=>My machine environment:*
1 master 1 CPU core, 2G Mhz, 1G Memory
2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
hadoop: hadoop-0.20.205.0
*=> My data:*
User GPS trace analysis. Each user has many gp location information. We
want to analyze them.
*=>My question:*
1. We have 2 datanode. But hadoop only used only 1 server? Is that not
effective?
2. When i run the 200M size data. It is successful. But if i run 30G data,
it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
report status for 600 seconds. Killing!"
*=>My map-reduce config:*
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
<value>50</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.job.shuffle.merge.percent</name>
<value>0.75</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:9003</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4000</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2000m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2000m</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>AutoReduce</value>
</property>
<property>
<name>io.sort.factor</name>
<value>12</value>
</property>
<property>
<name>io.sort.mb</name>
<value>300</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>8</value>
</property>
</configuration>
Re: Help. Strange thing. It's block me 1 week....
Posted by yaotian <ya...@gmail.com>.
I missed the key information: The servers are *Amazon EC2* *M1 Medium
Instance*
2013/1/18 yaotian <ya...@gmail.com>
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
Re: Help. Strange thing. It's block me 1 week....
Posted by yaotian <ya...@gmail.com>.
I missed the key information: The servers are *Amazon EC2* *M1 Medium
Instance*
2013/1/18 yaotian <ya...@gmail.com>
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
Re: Help. Strange thing. It's block me 1 week....
Posted by Harsh J <ha...@cloudera.com>.
What are your number of map and reduce slots configured to, per node? Also
noticed you seem to be requesting 4 GB memory from Reducers when your
slaves' maximum RAM itself nears that - the result may not be so good here
and can certainly cause slowdowns (due to swapping/etc.).
On Fri, Jan 18, 2013 at 8:47 AM, yaotian <ya...@gmail.com> wrote:
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
--
Harsh J
Re: Help. Strange thing. It's block me 1 week....
Posted by Harsh J <ha...@cloudera.com>.
What are your number of map and reduce slots configured to, per node? Also
noticed you seem to be requesting 4 GB memory from Reducers when your
slaves' maximum RAM itself nears that - the result may not be so good here
and can certainly cause slowdowns (due to swapping/etc.).
On Fri, Jan 18, 2013 at 8:47 AM, yaotian <ya...@gmail.com> wrote:
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
--
Harsh J
Re: Help. Strange thing. It's block me 1 week....
Posted by yaotian <ya...@gmail.com>.
I missed the key information: The servers are *Amazon EC2* *M1 Medium
Instance*
2013/1/18 yaotian <ya...@gmail.com>
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
Re: Help. Strange thing. It's block me 1 week....
Posted by Harsh J <ha...@cloudera.com>.
What are your number of map and reduce slots configured to, per node? Also
noticed you seem to be requesting 4 GB memory from Reducers when your
slaves' maximum RAM itself nears that - the result may not be so good here
and can certainly cause slowdowns (due to swapping/etc.).
On Fri, Jan 18, 2013 at 8:47 AM, yaotian <ya...@gmail.com> wrote:
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
--
Harsh J
Re: Help. Strange thing. It's block me 1 week....
Posted by Harsh J <ha...@cloudera.com>.
What are your number of map and reduce slots configured to, per node? Also
noticed you seem to be requesting 4 GB memory from Reducers when your
slaves' maximum RAM itself nears that - the result may not be so good here
and can certainly cause slowdowns (due to swapping/etc.).
On Fri, Jan 18, 2013 at 8:47 AM, yaotian <ya...@gmail.com> wrote:
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>
--
Harsh J
Re: Help. Strange thing. It's block me 1 week....
Posted by yaotian <ya...@gmail.com>.
I missed the key information: The servers are *Amazon EC2* *M1 Medium
Instance*
2013/1/18 yaotian <ya...@gmail.com>
> Hi,
>
> *=>My machine environment:*
> 1 master 1 CPU core, 2G Mhz, 1G Memory
> 2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
> hadoop: hadoop-0.20.205.0
>
> *=> My data:*
> User GPS trace analysis. Each user has many gp location information. We
> want to analyze them.
>
> *=>My question:*
> 1. We have 2 datanode. But hadoop only used only 1 server? Is that not
> effective?
>
> 2. When i run the 200M size data. It is successful. But if i run 30G data,
> it always to report "Task attempt_201301171429_0013_r_000000_0 failed to
> report status for 600 seconds. Killing!"
>
> *=>My map-reduce config:*
> <configuration>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>master:9001</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>50</value>
> </property>
>
> <property>
> <name>mapred.compress.map.output</name>
> <value>true</value>
> </property>
>
>
> <property>
> <name>mapred.job.shuffle.merge.percent</name>
> <value>0.75</value>
> </property>
>
>
> <property>
> <name>mapred.job.tracker.http.address</name>
> <value>0.0.0.0:9003</value>
> </property>
>
>
> <property>
> <name>mapreduce.reduce.memory.mb</name>
> <value>4000</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
> <property>
> <name>mapreduce.reduce.java.opts</name>
> <value>-Xmx2000m</value>
> </property>
>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>AutoReduce</value>
> </property>
>
> <property>
> <name>io.sort.factor</name>
> <value>12</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>300</value>
> </property>
>
>
> <property>
> <name>io.file.buffer.size</name>
> <value>65536</value>
> </property>
>
> <property>
> <name>dfs.datanode.handler.count</name>
> <value>8</value>
> </property>
> </configuration>
>
>
>
>
>
>
>