You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by norbi <no...@rocknob.de> on 2014/08/19 10:09:11 UTC

Hadoop HDFS slow after upgrade vom 0.20 -> to 2.0

Hi List,

we have upgraded Hadoop from our very old version 0.20 to Cloudera 4.7 
(hadoop 2.0), we are only using HDFS.
After upgrade (no configuration changes), the hdfs seems to bee very 
slow. It needs more than 2h to copying 40GB(47 files) out of the hdfs, 
bevor upgrading it was about 1h.

We are using 52 Datanodes with 10 discs, all connectet via 1gig.

How can we speed up hdfs, or where can be the bottleneck?

Norbert

Re: Hadoop HDFS slow after upgrade vom 0.20 -> to 2.0

Posted by Harsh J <ha...@cloudera.com>.
Hi Norbert,

Can you check the DN daemon's GC activity (you can look for the
JvmPauseMonitor messages in logs too, in 4.7)? Is it increased from
pre-upgrade, or generally seems high to you? What is your current DN
heap size?

One of the major changes between 0.20 to 2.0 is the BlockPool ID
concept for paths to blocks, which adds memory usage.

In addition, what OS (type and version) are you currently running?

On Tue, Aug 19, 2014 at 1:39 PM, norbi <no...@rocknob.de> wrote:
> Hi List,
>
> we have upgraded Hadoop from our very old version 0.20 to Cloudera 4.7
> (hadoop 2.0), we are only using HDFS.
> After upgrade (no configuration changes), the hdfs seems to bee very slow.
> It needs more than 2h to copying 40GB(47 files) out of the hdfs, bevor
> upgrading it was about 1h.
>
> We are using 52 Datanodes with 10 discs, all connectet via 1gig.
>
> How can we speed up hdfs, or where can be the bottleneck?
>
> Norbert



-- 
Harsh J

Re: Hadoop HDFS slow after upgrade vom 0.20 -> to 2.0

Posted by Harsh J <ha...@cloudera.com>.
Hi Norbert,

Can you check the DN daemon's GC activity (you can look for the
JvmPauseMonitor messages in logs too, in 4.7)? Is it increased from
pre-upgrade, or generally seems high to you? What is your current DN
heap size?

One of the major changes between 0.20 to 2.0 is the BlockPool ID
concept for paths to blocks, which adds memory usage.

In addition, what OS (type and version) are you currently running?

On Tue, Aug 19, 2014 at 1:39 PM, norbi <no...@rocknob.de> wrote:
> Hi List,
>
> we have upgraded Hadoop from our very old version 0.20 to Cloudera 4.7
> (hadoop 2.0), we are only using HDFS.
> After upgrade (no configuration changes), the hdfs seems to bee very slow.
> It needs more than 2h to copying 40GB(47 files) out of the hdfs, bevor
> upgrading it was about 1h.
>
> We are using 52 Datanodes with 10 discs, all connectet via 1gig.
>
> How can we speed up hdfs, or where can be the bottleneck?
>
> Norbert



-- 
Harsh J

Re: Hadoop HDFS slow after upgrade vom 0.20 -> to 2.0

Posted by Harsh J <ha...@cloudera.com>.
Hi Norbert,

Can you check the DN daemon's GC activity (you can look for the
JvmPauseMonitor messages in logs too, in 4.7)? Is it increased from
pre-upgrade, or generally seems high to you? What is your current DN
heap size?

One of the major changes between 0.20 to 2.0 is the BlockPool ID
concept for paths to blocks, which adds memory usage.

In addition, what OS (type and version) are you currently running?

On Tue, Aug 19, 2014 at 1:39 PM, norbi <no...@rocknob.de> wrote:
> Hi List,
>
> we have upgraded Hadoop from our very old version 0.20 to Cloudera 4.7
> (hadoop 2.0), we are only using HDFS.
> After upgrade (no configuration changes), the hdfs seems to bee very slow.
> It needs more than 2h to copying 40GB(47 files) out of the hdfs, bevor
> upgrading it was about 1h.
>
> We are using 52 Datanodes with 10 discs, all connectet via 1gig.
>
> How can we speed up hdfs, or where can be the bottleneck?
>
> Norbert



-- 
Harsh J

Re: Hadoop HDFS slow after upgrade vom 0.20 -> to 2.0

Posted by Harsh J <ha...@cloudera.com>.
Hi Norbert,

Can you check the DN daemon's GC activity (you can look for the
JvmPauseMonitor messages in logs too, in 4.7)? Is it increased from
pre-upgrade, or generally seems high to you? What is your current DN
heap size?

One of the major changes between 0.20 to 2.0 is the BlockPool ID
concept for paths to blocks, which adds memory usage.

In addition, what OS (type and version) are you currently running?

On Tue, Aug 19, 2014 at 1:39 PM, norbi <no...@rocknob.de> wrote:
> Hi List,
>
> we have upgraded Hadoop from our very old version 0.20 to Cloudera 4.7
> (hadoop 2.0), we are only using HDFS.
> After upgrade (no configuration changes), the hdfs seems to bee very slow.
> It needs more than 2h to copying 40GB(47 files) out of the hdfs, bevor
> upgrading it was about 1h.
>
> We are using 52 Datanodes with 10 discs, all connectet via 1gig.
>
> How can we speed up hdfs, or where can be the bottleneck?
>
> Norbert



-- 
Harsh J