You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by onur ascigil <on...@hotmail.com> on 2009/11/24 06:52:11 UTC

Hadoop Performance

I am running Hadoop on a single machine and have some questions about its performance.
I have a simple java program that runs breadth first search on a graph
with 5 nodes. It involves several map-reduce iterations. 

 I observed that, Hadoop takes too long to produce
results on such a simple job. So I attached a java profiler to my mapreduce job 
(runJar) to see what is going on. The java profiler reported several IPC 
connections to ports 54310 and 54311. Each of these IPCs to Jobtracker and 
HDFS takes around 10 seconds!

First of all why are these IPCs take this long? 
And I am wondering if there is anyway to improve
the performance of these IPC calls. Does Hadoop
have such a large fixed-cost ? 

I would really appreciate any comments or suggestions.
Thanks in advance,
Onur
 		 	   		  
_________________________________________________________________
Windows Live Hotmail: Your friends can get your Facebook updates, right from Hotmail®.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_4:092009

Re: Hadoop Performance

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
For "near" real time performance you may try Hbase. I had read about Streamy doing this, and their hadoop-world-nyc  ppt is available on their blog:
http://devblog.streamy.com/2009/07/24/streamy-hadoop-summit-hbase-goes-realtime/

Amogh


On 11/25/09 1:31 AM, "onur ascigil" <on...@hotmail.com> wrote:



Thanks for your reply! I am running 0.20.1 in pseudo-distributed mode
in Ubuntu.
I want to run interactive jobs with Hadoop and trying to see
if Hadoop is suitable for that purpose or not. I wonder
if there is anybody using Hadoop for interactive jobs where
given a query, output is returned within an acceptable
amount of time. Is Hadoop meant to be used only for batch processing?

> From: rekhajos@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Mon, 23 Nov 2009 22:30:43 -0800
> Subject: Re: Hadoop Performance
>
> Hi,
>
> Not sure about your hadoop version, and havent done much on single m/c setup myself. However there is a IPC improvement bug filed @ https://issues.apache.org/jira/browse/HADOOP-2864.Thanks!
>
> On 11/24/09 11:22 AM, "onur ascigil" <on...@hotmail.com> wrote:
>
>
>
> I am running Hadoop on a single machine and have some questions about its performance.
> I have a simple java program that runs breadth first search on a graph
> with 5 nodes. It involves several map-reduce iterations.
>
>  I observed that, Hadoop takes too long to produce
> results on such a simple job. So I attached a java profiler to my mapreduce job
> (runJar) to see what is going on. The java profiler reported several IPC
> connections to ports 54310 and 54311. Each of these IPCs to Jobtracker and
> HDFS takes around 10 seconds!
>
> First of all why are these IPCs take this long?
> And I am wondering if there is anyway to improve
> the performance of these IPC calls. Does Hadoop
> have such a large fixed-cost ?
>
> I would really appreciate any comments or suggestions.
> Thanks in advance,
> Onur
>
> _________________________________________________________________
> Windows Live Hotmail: Your friends can get your Facebook updates, right from Hotmail®.
> http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_4:092009
>

_________________________________________________________________
Windows Live Hotmail: Your friends can get your Facebook updates, right from Hotmail®.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_4:092009


RE: Hadoop Performance

Posted by onur ascigil <on...@hotmail.com>.
Thanks for your reply! I am running 0.20.1 in pseudo-distributed mode
in Ubuntu. 
I want to run interactive jobs with Hadoop and trying to see
if Hadoop is suitable for that purpose or not. I wonder 
if there is anybody using Hadoop for interactive jobs where
given a query, output is returned within an acceptable 
amount of time. Is Hadoop meant to be used only for batch processing?

> From: rekhajos@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Mon, 23 Nov 2009 22:30:43 -0800
> Subject: Re: Hadoop Performance
> 
> Hi,
> 
> Not sure about your hadoop version, and havent done much on single m/c setup myself. However there is a IPC improvement bug filed @ https://issues.apache.org/jira/browse/HADOOP-2864.Thanks!
> 
> On 11/24/09 11:22 AM, "onur ascigil" <on...@hotmail.com> wrote:
> 
> 
> 
> I am running Hadoop on a single machine and have some questions about its performance.
> I have a simple java program that runs breadth first search on a graph
> with 5 nodes. It involves several map-reduce iterations.
> 
>  I observed that, Hadoop takes too long to produce
> results on such a simple job. So I attached a java profiler to my mapreduce job
> (runJar) to see what is going on. The java profiler reported several IPC
> connections to ports 54310 and 54311. Each of these IPCs to Jobtracker and
> HDFS takes around 10 seconds!
> 
> First of all why are these IPCs take this long?
> And I am wondering if there is anyway to improve
> the performance of these IPC calls. Does Hadoop
> have such a large fixed-cost ?
> 
> I would really appreciate any comments or suggestions.
> Thanks in advance,
> Onur
> 
> _________________________________________________________________
> Windows Live Hotmail: Your friends can get your Facebook updates, right from Hotmail®.
> http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_4:092009
> 
 		 	   		  
_________________________________________________________________
Windows Live Hotmail: Your friends can get your Facebook updates, right from Hotmail®.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_4:092009

Re: Hadoop Performance

Posted by Rekha Joshi <re...@yahoo-inc.com>.
Hi,

Not sure about your hadoop version, and havent done much on single m/c setup myself. However there is a IPC improvement bug filed @ https://issues.apache.org/jira/browse/HADOOP-2864.Thanks!

On 11/24/09 11:22 AM, "onur ascigil" <on...@hotmail.com> wrote:



I am running Hadoop on a single machine and have some questions about its performance.
I have a simple java program that runs breadth first search on a graph
with 5 nodes. It involves several map-reduce iterations.

 I observed that, Hadoop takes too long to produce
results on such a simple job. So I attached a java profiler to my mapreduce job
(runJar) to see what is going on. The java profiler reported several IPC
connections to ports 54310 and 54311. Each of these IPCs to Jobtracker and
HDFS takes around 10 seconds!

First of all why are these IPCs take this long?
And I am wondering if there is anyway to improve
the performance of these IPC calls. Does Hadoop
have such a large fixed-cost ?

I would really appreciate any comments or suggestions.
Thanks in advance,
Onur

_________________________________________________________________
Windows Live Hotmail: Your friends can get your Facebook updates, right from Hotmail®.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_4:092009