You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Merto Mertek <ma...@gmail.com> on 2012/01/11 15:42:55 UTC

Similar frameworks like hadoop and taxonomy of distributed computing

Hi,

I was wondering if anyone knows any paper discussing and comparing the
mentioned topic. I am a little bit confused about the classification of
hadoop.. Is it a /cluster/comp grid/ a mix of them? What is hadoop in
relation with a cloud - probably just a technology that enables cloud
services..

 Can it be compared to cluster middleware like beowulf, oscar, condor,
sector/sphere, hpcc, dryad, etc? Why not? Like I could read hadoop main
field is text processing for problems that are embarrassingly parallel but
I cannot define what would be the case for deciding to use other cluster
technologies. Probably there are a lot of similarities between then,
however any comparison would be helpful.

It would be a big help to clarify in which field to classify all those
technologies and what are they most suitable for...

Thank you

Re: Similar frameworks like hadoop and taxonomy of distributed computing

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Here's some links to it:

Long Version: http://csse.usc.edu/csse/TECHRPTS/2008/usc-csse-2008-820/usc-csse-2008-820.pdf
Shorter Version (published in WICSA): http://wwwp.dnsalias.org/w/images/3/3f/AnatomyPhysiologyGridRevisited66.pdf

Cheers,
Chris

On Jan 11, 2012, at 4:02 PM, Mattmann, Chris A (388J) wrote:

> Also check out my paper on The Anatomy and Physiology of the Grid Revisited just Google for it where we also tried to look at this very issue.
> 
> Cheers,
> Chris 
> 
> Sent from my iPhone
> 
> On Jan 11, 2012, at 3:55 PM, "Brian Bockelman" <bb...@cse.unl.edu> wrote:
> 
>> 
>> On Jan 11, 2012, at 10:15 AM, George Kousiouris wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> see comments in text
>>> 
>>> On 1/11/2012 4:42 PM, Merto Mertek wrote:
>>>> Hi,
>>>> 
>>>> I was wondering if anyone knows any paper discussing and comparing the
>>>> mentioned topic. I am a little bit confused about the classification of
>>>> hadoop.. Is it a /cluster/comp grid/ a mix of them?
>>> I think that a strict definition would be an implementation of the map-reduce computing paradigm, for cluster usage.
>>> 
>>>> What is hadoop in
>>>> relation with a cloud - probably just a technology that enables cloud
>>>> services..
>>> It can be used to enable cloud services through a service oriented framework, like we are doing in
>>> http://users.ntua.gr/gkousiou/publications/PID2095917.pdf
>>> 
>>> in which we are trying to create a cloud service that offers MapReduce clusters as a service and distributed storage (through HDFS).
>>> But this is not the primary usage. This is the back end heavy processing in a cluster-like manner, specifically for parallel jobs that follow the MR logic.
>>> 
>>>> 
>>>> Can it be compared to cluster middleware like beowulf, oscar, condor,
>>>> sector/sphere, hpcc, dryad, etc? Why not?
>>> I could see some similarities with condor, mainly in the job submission processes, however i am not really sure how condor deals with parallel jobs.
>>> 
>> 
>> Since you asked…
>> 
>> <condor-geek>
>> 
>> Condor has a built-in concept of a set of jobs (called a "job cluster").  On top of its scheduler, there is a product called "DAGMan" (DAG = directed acyclic graph) that can manage a large number of jobs with interrelated dependencies (providing a partial ordering between jobs).  Condor with DAG is somewhat comparable to the concept of Hadoop tasks plus Oozie workflows (although the data aspects are very different - don't try to stretch it too far).
>> 
>> Condor / PBS / LSF / {OGE,SGE,GE} / SLURM provide the capability to start many identical jobs in parallel for MPI-type computations, but I consider MPI wildly different than the sort of workflows you see with MapReduce.  Specifically, "classic MPI"  programming (the ones you see in wide use, MPI2 and later are improved) mostly requires all processes to start simultaneously and the job crashes if one process dies.  I think this is why the Top10 computers tend to measure mean time between failure in tens of hours.
>> 
>> Unlike Hadoop, Condor jobs can flow between pools (they call this "flocking") and pools can naturally cover multiple data centers.  The largest demonstration I'm aware of is 100,000 cores across the US; the largest production pool I'm aware of is about 20-30k cores across 100 universities/labs on multiple continents.  This is not a criticism of Hadoop - Condor doesn't really have the same level of data-integration as Hadoop does, so tackles a much simpler problem (i.e., bring-your-own-data-management!).
>> 
>> </condor-geek>
>> 
>> Brian
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Similar frameworks like hadoop and taxonomy of distributed computing

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Also check out my paper on The Anatomy and Physiology of the Grid Revisited just Google for it where we also tried to look at this very issue.

Cheers,
Chris 

Sent from my iPhone

On Jan 11, 2012, at 3:55 PM, "Brian Bockelman" <bb...@cse.unl.edu> wrote:

> 
> On Jan 11, 2012, at 10:15 AM, George Kousiouris wrote:
> 
>> 
>> Hi,
>> 
>> see comments in text
>> 
>> On 1/11/2012 4:42 PM, Merto Mertek wrote:
>>> Hi,
>>> 
>>> I was wondering if anyone knows any paper discussing and comparing the
>>> mentioned topic. I am a little bit confused about the classification of
>>> hadoop.. Is it a /cluster/comp grid/ a mix of them?
>> I think that a strict definition would be an implementation of the map-reduce computing paradigm, for cluster usage.
>> 
>>> What is hadoop in
>>> relation with a cloud - probably just a technology that enables cloud
>>> services..
>> It can be used to enable cloud services through a service oriented framework, like we are doing in
>> http://users.ntua.gr/gkousiou/publications/PID2095917.pdf
>> 
>> in which we are trying to create a cloud service that offers MapReduce clusters as a service and distributed storage (through HDFS).
>> But this is not the primary usage. This is the back end heavy processing in a cluster-like manner, specifically for parallel jobs that follow the MR logic.
>> 
>>> 
>>> Can it be compared to cluster middleware like beowulf, oscar, condor,
>>> sector/sphere, hpcc, dryad, etc? Why not?
>> I could see some similarities with condor, mainly in the job submission processes, however i am not really sure how condor deals with parallel jobs.
>> 
> 
> Since you asked…
> 
> <condor-geek>
> 
> Condor has a built-in concept of a set of jobs (called a "job cluster").  On top of its scheduler, there is a product called "DAGMan" (DAG = directed acyclic graph) that can manage a large number of jobs with interrelated dependencies (providing a partial ordering between jobs).  Condor with DAG is somewhat comparable to the concept of Hadoop tasks plus Oozie workflows (although the data aspects are very different - don't try to stretch it too far).
> 
> Condor / PBS / LSF / {OGE,SGE,GE} / SLURM provide the capability to start many identical jobs in parallel for MPI-type computations, but I consider MPI wildly different than the sort of workflows you see with MapReduce.  Specifically, "classic MPI"  programming (the ones you see in wide use, MPI2 and later are improved) mostly requires all processes to start simultaneously and the job crashes if one process dies.  I think this is why the Top10 computers tend to measure mean time between failure in tens of hours.
> 
> Unlike Hadoop, Condor jobs can flow between pools (they call this "flocking") and pools can naturally cover multiple data centers.  The largest demonstration I'm aware of is 100,000 cores across the US; the largest production pool I'm aware of is about 20-30k cores across 100 universities/labs on multiple continents.  This is not a criticism of Hadoop - Condor doesn't really have the same level of data-integration as Hadoop does, so tackles a much simpler problem (i.e., bring-your-own-data-management!).
> 
> </condor-geek>
> 
> Brian
> 

Re: Similar frameworks like hadoop and taxonomy of distributed computing

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Jan 11, 2012, at 10:15 AM, George Kousiouris wrote:

> 
> Hi,
> 
> see comments in text
> 
> On 1/11/2012 4:42 PM, Merto Mertek wrote:
>> Hi,
>> 
>> I was wondering if anyone knows any paper discussing and comparing the
>> mentioned topic. I am a little bit confused about the classification of
>> hadoop.. Is it a /cluster/comp grid/ a mix of them?
> I think that a strict definition would be an implementation of the map-reduce computing paradigm, for cluster usage.
> 
>> What is hadoop in
>> relation with a cloud - probably just a technology that enables cloud
>> services..
> It can be used to enable cloud services through a service oriented framework, like we are doing in
> http://users.ntua.gr/gkousiou/publications/PID2095917.pdf
> 
> in which we are trying to create a cloud service that offers MapReduce clusters as a service and distributed storage (through HDFS).
> But this is not the primary usage. This is the back end heavy processing in a cluster-like manner, specifically for parallel jobs that follow the MR logic.
> 
>> 
>>  Can it be compared to cluster middleware like beowulf, oscar, condor,
>> sector/sphere, hpcc, dryad, etc? Why not?
> I could see some similarities with condor, mainly in the job submission processes, however i am not really sure how condor deals with parallel jobs.
> 

Since you asked…

<condor-geek>

Condor has a built-in concept of a set of jobs (called a "job cluster").  On top of its scheduler, there is a product called "DAGMan" (DAG = directed acyclic graph) that can manage a large number of jobs with interrelated dependencies (providing a partial ordering between jobs).  Condor with DAG is somewhat comparable to the concept of Hadoop tasks plus Oozie workflows (although the data aspects are very different - don't try to stretch it too far).

Condor / PBS / LSF / {OGE,SGE,GE} / SLURM provide the capability to start many identical jobs in parallel for MPI-type computations, but I consider MPI wildly different than the sort of workflows you see with MapReduce.  Specifically, "classic MPI"  programming (the ones you see in wide use, MPI2 and later are improved) mostly requires all processes to start simultaneously and the job crashes if one process dies.  I think this is why the Top10 computers tend to measure mean time between failure in tens of hours.

Unlike Hadoop, Condor jobs can flow between pools (they call this "flocking") and pools can naturally cover multiple data centers.  The largest demonstration I'm aware of is 100,000 cores across the US; the largest production pool I'm aware of is about 20-30k cores across 100 universities/labs on multiple continents.  This is not a criticism of Hadoop - Condor doesn't really have the same level of data-integration as Hadoop does, so tackles a much simpler problem (i.e., bring-your-own-data-management!).

</condor-geek>

Brian


Re: Similar frameworks like hadoop and taxonomy of distributed computing

Posted by George Kousiouris <gk...@mail.ntua.gr>.
Hi,

see comments in text

On 1/11/2012 4:42 PM, Merto Mertek wrote:
> Hi,
>
> I was wondering if anyone knows any paper discussing and comparing the
> mentioned topic. I am a little bit confused about the classification of
> hadoop.. Is it a /cluster/comp grid/ a mix of them?
I think that a strict definition would be an implementation of the 
map-reduce computing paradigm, for cluster usage.

> What is hadoop in
> relation with a cloud - probably just a technology that enables cloud
> services..
It can be used to enable cloud services through a service oriented 
framework, like we are doing in
http://users.ntua.gr/gkousiou/publications/PID2095917.pdf

in which we are trying to create a cloud service that offers MapReduce 
clusters as a service and distributed storage (through HDFS).
But this is not the primary usage. This is the back end heavy processing 
in a cluster-like manner, specifically for parallel jobs that follow the 
MR logic.

>
>   Can it be compared to cluster middleware like beowulf, oscar, condor,
> sector/sphere, hpcc, dryad, etc? Why not?
I could see some similarities with condor, mainly in the job submission 
processes, however i am not really sure how condor deals with parallel jobs.

> Like I could read hadoop main
> field is text processing for problems that are embarrassingly parallel but
> I cannot define what would be the case for deciding to use other cluster
> technologies. Probably there are a lot of similarities between then,
> however any comparison would be helpful.

Theoretically, you could write the program like an MPI implementation, 
which is more flexible and is not limited by the MR paradigm. However if 
you can find a way to convert your problem to a MR job, then the 
implementation would be much easier (I guess) as a hadoop job, since you 
will only have to write the Mapper and the Reducer. In MPI you would 
probably need all the communication framework too. Furthermore, Hadoop 
has also HDFS, which enables shared storage between the various hadoop 
components/threads etc. In other clusters you need to set this up 
specifically, through NFS or something similar (i guess again).

My two cents,
George

>
> It would be a big help to clarify in which field to classify all those
> technologies and what are they most suitable for...
>
> Thank you
>


-- 

---------------------------

George Kousiouris
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkousiou@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece


Re: Similar frameworks like hadoop and taxonomy of distributed computing

Posted by "W.P. McNeill" <bi...@gmail.com>.
I don't know of an academic paper, though this blog post has a nice survey:
http://srinathsview.blogspot.com/2011/10/list-of-known-scalable-architecture.html