You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Xavier Stevens <Xa...@fox.com> on 2009/02/13 17:38:44 UTC

Hadoop Write Performance

Does anyone have an expected or experienced write speed to HDFS outside
of Map/Reduce?  Any recommendations on properties to tweak in
hadoop-site.xml?
 
Currently I have a multi-threaded writer where each thread is writing to
a different file.  But after a while I get this:
 
java.io.IOException: Could not get block locations. Aborting...
 at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2081)
 at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja
va:1702)
 at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1818)
 
Which is perhaps indicating that the namenode is overwhelmed?
 
 
Thanks,
 
-Xavier

RE: Hadoop Write Performance

Posted by Xavier Stevens <Xa...@fox.com>.

Raghu,

I was using 0.17.2.1, but I installed 0.18.3 a couple of days ago.  I
also separated out my secondarynamenode and jobtracker to another
machine.  In addition my network operations people had misconfigured
some switches which ended up being my bottleneck.

After all of that my writer and Hadoop is working great.

-Xavier

-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
Sent: Wednesday, February 18, 2009 11:49 AM
To: core-user@hadoop.apache.org
Subject: Re: Hadoop Write Performance

what is the hadoop version?

You could check log on a datanode around that time. You could post any
suspicious errors. For e.g. you can trace a particular block in client
and datanode logs.

Most likely it not a NameNode issue, but you can check NameNode log as
well.

Raghu.

Xavier Stevens wrote:
> Does anyone have an expected or experienced write speed to HDFS 
> outside of Map/Reduce?  Any recommendations on properties to tweak in 
> hadoop-site.xml?
>  
> Currently I have a multi-threaded writer where each thread is writing 
> to a different file.  But after a while I get this:
>  
> java.io.IOException: Could not get block locations. Aborting...
>  at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(D
> FS
> Client.java:2081)
>  at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.
> ja
> va:1702)
>  at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl
> ie
> nt.java:1818)
>  
> Which is perhaps indicating that the namenode is overwhelmed?
>  
>  
> Thanks,
>  
> -Xavier
>

Re: Hadoop Write Performance

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

what is the hadoop version?

You could check log on a datanode around that time. You could post any 
suspicious errors. For e.g. you can trace a particular block in client 
and datanode logs.

Most likely it not a NameNode issue, but you can check NameNode log as well.

Raghu.

Xavier Stevens wrote:
> Does anyone have an expected or experienced write speed to HDFS outside
> of Map/Reduce?  Any recommendations on properties to tweak in
> hadoop-site.xml?
>  
> Currently I have a multi-threaded writer where each thread is writing to
> a different file.  But after a while I get this:
>  
> java.io.IOException: Could not get block locations. Aborting...
>  at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
> Client.java:2081)
>  at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja
> va:1702)
>  at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
> nt.java:1818)
>  
> Which is perhaps indicating that the namenode is overwhelmed?
>  
>  
> Thanks,
>  
> -Xavier
>

Re: Disabling Reporter Output?

Posted by jason hadoop <ja...@gmail.com>.

There is a moderate a mount of setup and tear down in any hadoop job. It may
be that your 10 seconds are primarily that.


On Wed, Feb 18, 2009 at 11:29 AM, Philipp Dobrigkeit <PD...@gmx.de>wrote:

> I am currently trying Map/Reduce in Eclipse. The input comes from an hbase
> table. The performance of my jobs is terrible. Even when only done on a
> single row it takes around 10 seconds to complete the job. My current guess
> is that the reporting done to the eclipse console might play a role in here.
>
> I am looking for a way to disable the printing of status to the console.
>
> Or of course any other ideas what is going wrong here.
>
> This is a single node cluster, pretty common desktop hardware and writing
> to the hbase is a breeze.
>
> Thanks
> Philipp
> --
> Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
> für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
>

Disabling Reporter Output?

Posted by Philipp Dobrigkeit <PD...@gmx.de>.

I am currently trying Map/Reduce in Eclipse. The input comes from an hbase table. The performance of my jobs is terrible. Even when only done on a single row it takes around 10 seconds to complete the job. My current guess is that the reporting done to the eclipse console might play a role in here. 

I am looking for a way to disable the printing of status to the console.

Or of course any other ideas what is going wrong here.

This is a single node cluster, pretty common desktop hardware and writing to the hbase is a breeze.

Thanks
Philipp
-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a