You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oodt.apache.org by Tom Barber <to...@meteorite.bi> on 2013/11/01 09:09:07 UTC

Hadoop Similarities

Morning,

Chris will remember a couple of years ago me asking on IRC about how 
OODT differs from Hadoop in terms of features and functionality, which 
he then gave a great page long explanation as to what the differences 
were. I vowed to copy that information off and save it somewhere useful, 
and of course never did, then I asked Sean who also couldn't dig it up.

So, fine folks of the OODT community, for a novice like me who would be 
interested in "selling" OODT to users if the correct usecase came along, 
when someone says "Isn't OODT just a different type of Hadoop?" what do 
I answer?

I'd like to document this type of comparison stuff on the Wiki as well 
as I think its useful for people to know and understand.

Cheers

Tom

-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Posted by Tom Barber <to...@meteorite.bi>.

Thanks for that Lewis, very useful. Indeed my question was never 
designed to be a pro vs con's comparison, I'm just interested to know 
where people see the differences as Hadoop clearly rules the roost in 
"Big Data" stuff.

My background is in Business Intelligence and so I come into contact 
with plenty of Hadoop + Map Reduce PR daily and you end up swamped with 
that stuff (not that I've found much Hadoop in the wild, just Press 
fodder). I'm interested because people clearly see a hole in the Hadoop 
eco system that allows a gap in the market for the OODT setup, and 
should that use case arise I'd like to make sure I'm choosing the 
correct tool for the job.

Cheers

Tom



On 03/11/13 14:27, Lewis John Mcgibbney wrote:
> Hi Tom,
>
> On Fri, Nov 1, 2013 at 8:09 AM, Tom Barber <tom.barber@meteorite.bi 
> <ma...@meteorite.bi>> wrote:
>
>     Morning,
>
>     Chris will remember a couple of years ago me asking on IRC about
>     how OODT differs from Hadoop in terms of features and
>     functionality, which he then gave a great page long explanation as
>     to what the differences were. I vowed to copy that information off
>     and save it somewhere useful, and of course never did, then I
>     asked Sean who also couldn't dig it up.
>
>
> What a shame. Would have been great to at least see this if not get it 
> documented as you mention. Oh well. Community lists are as good as 
> it's get IMHO so here we go.
>
>
>     So, fine folks of the OODT community, for a novice like me who
>     would be interested in "selling" OODT to users if the correct
>     usecase came along, when someone says "Isn't OODT just a different
>     type of Hadoop?" what do I answer?
>
>
> I am relatively new to OODT. My opinion here is pretty abstract 
> however I have been using Hadoop much longer and therefore hope that 
> some of what I'm saying contributes to our shared understanding.
>
> OODT
> =====
> I was attracted to OODT due to the modular, component-oriented design 
> of the project as a whole. It is down to the system designer (the 
> initial person/team who pick up OODT) to review and select which 
> aspects of the overall project they need to select to satisfy and 
> accommodate their data work-flow(s). Due to the modular nature of the 
> project, components can be substituted as the nature and/or 
> characteristics of the data work-flow change over time. A beautiful 
> aspect of OODT is that many tools and instruments have been built to 
> accommodate the above-mentioned requirements for data work-flows.
>
> Hadoop
> ======
> For me, Hadoop (something which I consider a blanket term for what is 
> essentially an OS) is an operating system as oppose to OODT which I've 
> described as a modularized data workflow platform. It provides a 
> filesystem (HDFS), data processing platform (MapReduce), and API 
> through which we can submit and execute jobs. Additionally we all know 
> about the bolt on's such as workflow monitoring, security and so 
> forth. In this respect it is down to the engineer to build the data 
> workflow around/on-top of Hadoop given the available components 
> provided. One thing which I think characterizes Hadoop here as well is 
> the fact that generally speaking data follows a 'write-once read many' 
> logic whereas this is not necessarily the case with OODT.
>
>
>     I'd like to document this type of comparison stuff on the Wiki as
>     well as I think its useful for people to know and understand.
>
>
> I'm sure that the above is obvious to many and that I'm merely 
> mentioning material from the immediate surroundings, however this is 
> my experience so far using OODT and the comparisons I can draw myself.
>
> When i started responding, it was not my aim to engage in a pro's vs 
> con's of each piece of software so I hope the brief replay as above 
> can act as a contribution to the conversation and we can take this 
> onwards.
>
> Thanks
> Lewis


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Posted by Tom Barber <to...@meteorite.bi>.

Thanks for that Lewis, very useful. Indeed my question was never 
designed to be a pro vs con's comparison, I'm just interested to know 
where people see the differences as Hadoop clearly rules the roost in 
"Big Data" stuff.

My background is in Business Intelligence and so I come into contact 
with plenty of Hadoop + Map Reduce PR daily and you end up swamped with 
that stuff (not that I've found much Hadoop in the wild, just Press 
fodder). I'm interested because people clearly see a hole in the Hadoop 
eco system that allows a gap in the market for the OODT setup, and 
should that use case arise I'd like to make sure I'm choosing the 
correct tool for the job.

Cheers

Tom



On 03/11/13 14:27, Lewis John Mcgibbney wrote:
> Hi Tom,
>
> On Fri, Nov 1, 2013 at 8:09 AM, Tom Barber <tom.barber@meteorite.bi 
> <ma...@meteorite.bi>> wrote:
>
>     Morning,
>
>     Chris will remember a couple of years ago me asking on IRC about
>     how OODT differs from Hadoop in terms of features and
>     functionality, which he then gave a great page long explanation as
>     to what the differences were. I vowed to copy that information off
>     and save it somewhere useful, and of course never did, then I
>     asked Sean who also couldn't dig it up.
>
>
> What a shame. Would have been great to at least see this if not get it 
> documented as you mention. Oh well. Community lists are as good as 
> it's get IMHO so here we go.
>
>
>     So, fine folks of the OODT community, for a novice like me who
>     would be interested in "selling" OODT to users if the correct
>     usecase came along, when someone says "Isn't OODT just a different
>     type of Hadoop?" what do I answer?
>
>
> I am relatively new to OODT. My opinion here is pretty abstract 
> however I have been using Hadoop much longer and therefore hope that 
> some of what I'm saying contributes to our shared understanding.
>
> OODT
> =====
> I was attracted to OODT due to the modular, component-oriented design 
> of the project as a whole. It is down to the system designer (the 
> initial person/team who pick up OODT) to review and select which 
> aspects of the overall project they need to select to satisfy and 
> accommodate their data work-flow(s). Due to the modular nature of the 
> project, components can be substituted as the nature and/or 
> characteristics of the data work-flow change over time. A beautiful 
> aspect of OODT is that many tools and instruments have been built to 
> accommodate the above-mentioned requirements for data work-flows.
>
> Hadoop
> ======
> For me, Hadoop (something which I consider a blanket term for what is 
> essentially an OS) is an operating system as oppose to OODT which I've 
> described as a modularized data workflow platform. It provides a 
> filesystem (HDFS), data processing platform (MapReduce), and API 
> through which we can submit and execute jobs. Additionally we all know 
> about the bolt on's such as workflow monitoring, security and so 
> forth. In this respect it is down to the engineer to build the data 
> workflow around/on-top of Hadoop given the available components 
> provided. One thing which I think characterizes Hadoop here as well is 
> the fact that generally speaking data follows a 'write-once read many' 
> logic whereas this is not necessarily the case with OODT.
>
>
>     I'd like to document this type of comparison stuff on the Wiki as
>     well as I think its useful for people to know and understand.
>
>
> I'm sure that the above is obvious to many and that I'm merely 
> mentioning material from the immediate surroundings, however this is 
> my experience so far using OODT and the comparisons I can draw myself.
>
> When i started responding, it was not my aim to engage in a pro's vs 
> con's of each piece of software so I hope the brief replay as above 
> can act as a contribution to the conversation and we can take this 
> onwards.
>
> Thanks
> Lewis


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Tom,

On Fri, Nov 1, 2013 at 8:09 AM, Tom Barber <to...@meteorite.bi> wrote:

>  Morning,
>
> Chris will remember a couple of years ago me asking on IRC about how OODT
> differs from Hadoop in terms of features and functionality, which he then
> gave a great page long explanation as to what the differences were. I vowed
> to copy that information off and save it somewhere useful, and of course
> never did, then I asked Sean who also couldn't dig it up.
>

What a shame. Would have been great to at least see this if not get it
documented as you mention. Oh well. Community lists are as good as it's get
IMHO so here we go.

>
> So, fine folks of the OODT community, for a novice like me who would be
> interested in "selling" OODT to users if the correct usecase came along,
> when someone says "Isn't OODT just a different type of Hadoop?" what do I
> answer?
>

I am relatively new to OODT. My opinion here is pretty abstract however I
have been using Hadoop much longer and therefore hope that some of what I'm
saying contributes to our shared understanding.

OODT
=====
I was attracted to OODT due to the modular, component-oriented design of
the project as a whole. It is down to the system designer (the initial
person/team who pick up OODT) to review and select which aspects of the
overall project they need to select to satisfy and accommodate their data
work-flow(s). Due to the modular nature of the project, components can be
substituted as the nature and/or characteristics of the data work-flow
change over time. A beautiful aspect of OODT is that many tools and
instruments have been built to accommodate the above-mentioned requirements
for data work-flows.

Hadoop
======
For me, Hadoop (something which I consider a blanket term for what is
essentially an OS) is an operating system as oppose to OODT which I've
described as a modularized data workflow platform. It provides a filesystem
(HDFS), data processing platform (MapReduce), and API through which we can
submit and execute jobs. Additionally we all know about the bolt on's such
as workflow monitoring, security and so forth. In this respect it is down
to the engineer to build the data workflow around/on-top of Hadoop given
the available components provided. One thing which I think characterizes
Hadoop here as well is the fact that generally speaking data follows a
'write-once read many' logic whereas this is not necessarily the case with
OODT.

>
> I'd like to document this type of comparison stuff on the Wiki as well as
> I think its useful for people to know and understand.
>
>
I'm sure that the above is obvious to many and that I'm merely mentioning
material from the immediate surroundings, however this is my experience so
far using OODT and the comparisons I can draw myself.

When i started responding, it was not my aim to engage in a pro's vs con's
of each piece of software so I hope the brief replay as above can act as a
contribution to the conversation and we can take this onwards.

Thanks
Lewis

Re: Hadoop Similarities

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Tom,

On Fri, Nov 1, 2013 at 8:09 AM, Tom Barber <to...@meteorite.bi> wrote:

>  Morning,
>
> Chris will remember a couple of years ago me asking on IRC about how OODT
> differs from Hadoop in terms of features and functionality, which he then
> gave a great page long explanation as to what the differences were. I vowed
> to copy that information off and save it somewhere useful, and of course
> never did, then I asked Sean who also couldn't dig it up.
>

What a shame. Would have been great to at least see this if not get it
documented as you mention. Oh well. Community lists are as good as it's get
IMHO so here we go.

>
> So, fine folks of the OODT community, for a novice like me who would be
> interested in "selling" OODT to users if the correct usecase came along,
> when someone says "Isn't OODT just a different type of Hadoop?" what do I
> answer?
>

I am relatively new to OODT. My opinion here is pretty abstract however I
have been using Hadoop much longer and therefore hope that some of what I'm
saying contributes to our shared understanding.

OODT
=====
I was attracted to OODT due to the modular, component-oriented design of
the project as a whole. It is down to the system designer (the initial
person/team who pick up OODT) to review and select which aspects of the
overall project they need to select to satisfy and accommodate their data
work-flow(s). Due to the modular nature of the project, components can be
substituted as the nature and/or characteristics of the data work-flow
change over time. A beautiful aspect of OODT is that many tools and
instruments have been built to accommodate the above-mentioned requirements
for data work-flows.

Hadoop
======
For me, Hadoop (something which I consider a blanket term for what is
essentially an OS) is an operating system as oppose to OODT which I've
described as a modularized data workflow platform. It provides a filesystem
(HDFS), data processing platform (MapReduce), and API through which we can
submit and execute jobs. Additionally we all know about the bolt on's such
as workflow monitoring, security and so forth. In this respect it is down
to the engineer to build the data workflow around/on-top of Hadoop given
the available components provided. One thing which I think characterizes
Hadoop here as well is the fact that generally speaking data follows a
'write-once read many' logic whereas this is not necessarily the case with
OODT.

>
> I'd like to document this type of comparison stuff on the Wiki as well as
> I think its useful for people to know and understand.
>
>
I'm sure that the above is obvious to many and that I'm merely mentioning
material from the immediate surroundings, however this is my experience so
far using OODT and the comparisons I can draw myself.

When i started responding, it was not my aim to engage in a pro's vs con's
of each piece of software so I hope the brief replay as above can act as a
contribution to the conversation and we can take this onwards.

Thanks
Lewis

Re: Hadoop Similarities

Posted by Tom Barber <to...@meteorite.bi>.

Cheers guys, I'll try collate this stuff and slap it in a Wiki page so 
other folk new the project get a decent idea as to how it differs. I 
think where I'm getting confused coming from a BI background is people 
just think of ETL and Data Storage, and we're easily distracted when it 
comes to the other stuff, unlike the science boffs ;)

  Thats the problem with all these Hadoop projects with the mega corps 
behind them, they get all the PR :)

Anyway I'll try and fashion something out of it, I'm also messing around 
with sample data and the OODT stack to gain a better idea, but like any 
of these systems, its hard when you don't have a real usecase for it.

Tom


On 03/11/13 17:11, Lewis John Mcgibbney wrote:
> Yeah exactly... that's what I meant to say ;)
>
>
> On Sun, Nov 3, 2013 at 4:07 PM, Chris Mattmann <mattmann@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hey Guys,
>
>     Lewis's description is pretty spot on.
>
>     Basically Apache Hadoop is a kernel/OS set of capabilities and
>     functionalities
>     for workflow processing (used to only be for M/R but now with YARN for
>     mostly any computational type) and for storage, distributed, highly
>     available
>     and replicated (which is needed on low cost unreliable, shared nothing
>     hardware).
>
>     Apache OODT is a data management toolkit and data processing
>     toolkit, that
>     can
>     interoperate and *leverage* Hadoop as one of the capabilities
>     needed in
>     building
>     data systems. It can store data to HDFS (using the File Manager)
>     either in
>     standard
>     ingestion and processing use cases; it can submit jobs to M/R or YARN
>     style workflows
>     and use that as the heavy lifter for the workflow processor.
>
>     In short, OODT is the code that you normally write over and over again
>     when building
>     data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS,
>     Condor,
>     and Ganglia,
>     GridFTP or bbFTP, etc. In other words, what you need to build an
>     end to
>     end data ingestion
>     and processing and dissemination system. OODT makes that "glue
>     code" very
>     easy to configure
>     and write (via XML and configuration policy/architecture) and
>     provides a
>     repeatable, and
>     easily discernible way to build these systems.
>
>     HTH!
>
>     Cheers,
>     Chris
>
>
>
>
>     -----Original Message-----
>     From: Tom Barber <tom.barber@meteorite.bi
>     <ma...@meteorite.bi>>
>     Reply-To: "user@oodt.apache.org <ma...@oodt.apache.org>"
>     <user@oodt.apache.org <ma...@oodt.apache.org>>
>     Date: Friday, November 1, 2013 1:09 AM
>     To: "user@oodt.apache.org <ma...@oodt.apache.org>"
>     <user@oodt.apache.org <ma...@oodt.apache.org>>
>     Subject: Hadoop Similarities
>
>     >
>     >
>     >
>     >Morning,
>     >
>     >Chris will remember a couple of years ago me asking on IRC about
>     how OODT
>     >differs from Hadoop in terms of features and functionality, which
>     he then
>     >gave a great page long explanation as to what the differences were. I
>     >vowed to copy that information off and
>     > save it somewhere useful, and of course never did, then I asked
>     Sean who
>     >also couldn't dig it up.
>     >
>     >So, fine folks of the OODT community, for a novice like me who
>     would be
>     >interested in "selling" OODT to users if the correct usecase came
>     along,
>     >when someone says "Isn't OODT just a different type of Hadoop?"
>     what do I
>     >answer?
>     >
>     >I'd like to document this type of comparison stuff on the Wiki as
>     well as
>     >I think its useful for people to know and understand.
>     >
>     >Cheers
>     >
>     >Tom
>     >
>     >--
>     >Tom Barber | Technical Director
>     >
>     >meteorite bi
>     >T: +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>     >W: www.meteorite.bi <http://www.meteorite.bi>
>     <http://www.meteorite.bi> |
>     >Skype: meteorite.consulting
>     >A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2
>     7YG, UK
>
>
>
>
>
> -- 
> /Lewis/


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Posted by Tom Barber <to...@meteorite.bi>.

Cheers guys, I'll try collate this stuff and slap it in a Wiki page so 
other folk new the project get a decent idea as to how it differs. I 
think where I'm getting confused coming from a BI background is people 
just think of ETL and Data Storage, and we're easily distracted when it 
comes to the other stuff, unlike the science boffs ;)

  Thats the problem with all these Hadoop projects with the mega corps 
behind them, they get all the PR :)

Anyway I'll try and fashion something out of it, I'm also messing around 
with sample data and the OODT stack to gain a better idea, but like any 
of these systems, its hard when you don't have a real usecase for it.

Tom


On 03/11/13 17:11, Lewis John Mcgibbney wrote:
> Yeah exactly... that's what I meant to say ;)
>
>
> On Sun, Nov 3, 2013 at 4:07 PM, Chris Mattmann <mattmann@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hey Guys,
>
>     Lewis's description is pretty spot on.
>
>     Basically Apache Hadoop is a kernel/OS set of capabilities and
>     functionalities
>     for workflow processing (used to only be for M/R but now with YARN for
>     mostly any computational type) and for storage, distributed, highly
>     available
>     and replicated (which is needed on low cost unreliable, shared nothing
>     hardware).
>
>     Apache OODT is a data management toolkit and data processing
>     toolkit, that
>     can
>     interoperate and *leverage* Hadoop as one of the capabilities
>     needed in
>     building
>     data systems. It can store data to HDFS (using the File Manager)
>     either in
>     standard
>     ingestion and processing use cases; it can submit jobs to M/R or YARN
>     style workflows
>     and use that as the heavy lifter for the workflow processor.
>
>     In short, OODT is the code that you normally write over and over again
>     when building
>     data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS,
>     Condor,
>     and Ganglia,
>     GridFTP or bbFTP, etc. In other words, what you need to build an
>     end to
>     end data ingestion
>     and processing and dissemination system. OODT makes that "glue
>     code" very
>     easy to configure
>     and write (via XML and configuration policy/architecture) and
>     provides a
>     repeatable, and
>     easily discernible way to build these systems.
>
>     HTH!
>
>     Cheers,
>     Chris
>
>
>
>
>     -----Original Message-----
>     From: Tom Barber <tom.barber@meteorite.bi
>     <ma...@meteorite.bi>>
>     Reply-To: "user@oodt.apache.org <ma...@oodt.apache.org>"
>     <user@oodt.apache.org <ma...@oodt.apache.org>>
>     Date: Friday, November 1, 2013 1:09 AM
>     To: "user@oodt.apache.org <ma...@oodt.apache.org>"
>     <user@oodt.apache.org <ma...@oodt.apache.org>>
>     Subject: Hadoop Similarities
>
>     >
>     >
>     >
>     >Morning,
>     >
>     >Chris will remember a couple of years ago me asking on IRC about
>     how OODT
>     >differs from Hadoop in terms of features and functionality, which
>     he then
>     >gave a great page long explanation as to what the differences were. I
>     >vowed to copy that information off and
>     > save it somewhere useful, and of course never did, then I asked
>     Sean who
>     >also couldn't dig it up.
>     >
>     >So, fine folks of the OODT community, for a novice like me who
>     would be
>     >interested in "selling" OODT to users if the correct usecase came
>     along,
>     >when someone says "Isn't OODT just a different type of Hadoop?"
>     what do I
>     >answer?
>     >
>     >I'd like to document this type of comparison stuff on the Wiki as
>     well as
>     >I think its useful for people to know and understand.
>     >
>     >Cheers
>     >
>     >Tom
>     >
>     >--
>     >Tom Barber | Technical Director
>     >
>     >meteorite bi
>     >T: +44 20 8133 3730 <tel:%2B44%2020%208133%203730>
>     >W: www.meteorite.bi <http://www.meteorite.bi>
>     <http://www.meteorite.bi> |
>     >Skype: meteorite.consulting
>     >A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2
>     7YG, UK
>
>
>
>
>
> -- 
> /Lewis/


-- 
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Yeah exactly... that's what I meant to say ;)


On Sun, Nov 3, 2013 at 4:07 PM, Chris Mattmann <ma...@apache.org> wrote:

> Hey Guys,
>
> Lewis's description is pretty spot on.
>
> Basically Apache Hadoop is a kernel/OS set of capabilities and
> functionalities
> for workflow processing (used to only be for M/R but now with YARN for
> mostly any computational type) and for storage, distributed, highly
> available
> and replicated (which is needed on low cost unreliable, shared nothing
> hardware).
>
> Apache OODT is a data management toolkit and data processing toolkit, that
> can
> interoperate and *leverage* Hadoop as one of the capabilities needed in
> building
> data systems. It can store data to HDFS (using the File Manager) either in
> standard
> ingestion and processing use cases; it can submit jobs to M/R or YARN
> style workflows
> and use that as the heavy lifter for the workflow processor.
>
> In short, OODT is the code that you normally write over and over again
> when building
> data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS, Condor,
> and Ganglia,
> GridFTP or bbFTP, etc. In other words, what you need to build an end to
> end data ingestion
> and processing and dissemination system. OODT makes that "glue code" very
> easy to configure
> and write (via XML and configuration policy/architecture) and provides a
> repeatable, and
> easily discernible way to build these systems.
>
> HTH!
>
> Cheers,
> Chris
>
>
>
>
> -----Original Message-----
> From: Tom Barber <to...@meteorite.bi>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Date: Friday, November 1, 2013 1:09 AM
> To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Subject: Hadoop Similarities
>
> >
> >
> >
> >Morning,
> >
> >Chris will remember a couple of years ago me asking on IRC about how OODT
> >differs from Hadoop in terms of features and functionality, which he then
> >gave a great page long explanation as to what the differences were. I
> >vowed to copy that information off and
> > save it somewhere useful, and of course never did, then I asked Sean who
> >also couldn't dig it up.
> >
> >So, fine folks of the OODT community, for a novice like me who would be
> >interested in "selling" OODT to users if the correct usecase came along,
> >when someone says "Isn't OODT just a different type of Hadoop?" what do I
> >answer?
> >
> >I'd like to document this type of comparison stuff on the Wiki as well as
> >I think its useful for people to know and understand.
> >
> >Cheers
> >
> >Tom
> >
> >--
> >Tom Barber | Technical Director
> >
> >meteorite bi
> >T: +44 20 8133 3730
> >W: www.meteorite.bi <http://www.meteorite.bi> |
> >Skype: meteorite.consulting
> >A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>
>
>


-- 
*Lewis*

Re: Hadoop Similarities

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Yeah exactly... that's what I meant to say ;)


On Sun, Nov 3, 2013 at 4:07 PM, Chris Mattmann <ma...@apache.org> wrote:

> Hey Guys,
>
> Lewis's description is pretty spot on.
>
> Basically Apache Hadoop is a kernel/OS set of capabilities and
> functionalities
> for workflow processing (used to only be for M/R but now with YARN for
> mostly any computational type) and for storage, distributed, highly
> available
> and replicated (which is needed on low cost unreliable, shared nothing
> hardware).
>
> Apache OODT is a data management toolkit and data processing toolkit, that
> can
> interoperate and *leverage* Hadoop as one of the capabilities needed in
> building
> data systems. It can store data to HDFS (using the File Manager) either in
> standard
> ingestion and processing use cases; it can submit jobs to M/R or YARN
> style workflows
> and use that as the heavy lifter for the workflow processor.
>
> In short, OODT is the code that you normally write over and over again
> when building
> data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS, Condor,
> and Ganglia,
> GridFTP or bbFTP, etc. In other words, what you need to build an end to
> end data ingestion
> and processing and dissemination system. OODT makes that "glue code" very
> easy to configure
> and write (via XML and configuration policy/architecture) and provides a
> repeatable, and
> easily discernible way to build these systems.
>
> HTH!
>
> Cheers,
> Chris
>
>
>
>
> -----Original Message-----
> From: Tom Barber <to...@meteorite.bi>
> Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Date: Friday, November 1, 2013 1:09 AM
> To: "user@oodt.apache.org" <us...@oodt.apache.org>
> Subject: Hadoop Similarities
>
> >
> >
> >
> >Morning,
> >
> >Chris will remember a couple of years ago me asking on IRC about how OODT
> >differs from Hadoop in terms of features and functionality, which he then
> >gave a great page long explanation as to what the differences were. I
> >vowed to copy that information off and
> > save it somewhere useful, and of course never did, then I asked Sean who
> >also couldn't dig it up.
> >
> >So, fine folks of the OODT community, for a novice like me who would be
> >interested in "selling" OODT to users if the correct usecase came along,
> >when someone says "Isn't OODT just a different type of Hadoop?" what do I
> >answer?
> >
> >I'd like to document this type of comparison stuff on the Wiki as well as
> >I think its useful for people to know and understand.
> >
> >Cheers
> >
> >Tom
> >
> >--
> >Tom Barber | Technical Director
> >
> >meteorite bi
> >T: +44 20 8133 3730
> >W: www.meteorite.bi <http://www.meteorite.bi> |
> >Skype: meteorite.consulting
> >A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK
>
>
>


-- 
*Lewis*

Re: Hadoop Similarities

Posted by Chris Mattmann <ma...@apache.org>.

Hey Guys,

Lewis's description is pretty spot on.

Basically Apache Hadoop is a kernel/OS set of capabilities and
functionalities
for workflow processing (used to only be for M/R but now with YARN for
mostly any computational type) and for storage, distributed, highly
available
and replicated (which is needed on low cost unreliable, shared nothing
hardware).

Apache OODT is a data management toolkit and data processing toolkit, that
can
interoperate and *leverage* Hadoop as one of the capabilities needed in
building
data systems. It can store data to HDFS (using the File Manager) either in
standard
ingestion and processing use cases; it can submit jobs to M/R or YARN
style workflows
and use that as the heavy lifter for the workflow processor.

In short, OODT is the code that you normally write over and over again
when building
data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS, Condor,
and Ganglia,
GridFTP or bbFTP, etc. In other words, what you need to build an end to
end data ingestion
and processing and dissemination system. OODT makes that "glue code" very
easy to configure
and write (via XML and configuration policy/architecture) and provides a
repeatable, and
easily discernible way to build these systems.

HTH!

Cheers,
Chris




-----Original Message-----
From: Tom Barber <to...@meteorite.bi>
Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
Date: Friday, November 1, 2013 1:09 AM
To: "user@oodt.apache.org" <us...@oodt.apache.org>
Subject: Hadoop Similarities

>
>
>
>Morning,
>
>Chris will remember a couple of years ago me asking on IRC about how OODT
>differs from Hadoop in terms of features and functionality, which he then
>gave a great page long explanation as to what the differences were. I
>vowed to copy that information off and
> save it somewhere useful, and of course never did, then I asked Sean who
>also couldn't dig it up.
>
>So, fine folks of the OODT community, for a novice like me who would be
>interested in "selling" OODT to users if the correct usecase came along,
>when someone says "Isn't OODT just a different type of Hadoop?" what do I
>answer?
>
>I'd like to document this type of comparison stuff on the Wiki as well as
>I think its useful for people to know and understand.
>
>Cheers
>
>Tom
>
>-- 
>Tom Barber | Technical Director
>
>meteorite bi
>T: +44 20 8133 3730
>W: www.meteorite.bi <http://www.meteorite.bi> |
>Skype: meteorite.consulting
>A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Hadoop Similarities

Posted by Chris Mattmann <ma...@apache.org>.

Hey Guys,

Lewis's description is pretty spot on.

Basically Apache Hadoop is a kernel/OS set of capabilities and
functionalities
for workflow processing (used to only be for M/R but now with YARN for
mostly any computational type) and for storage, distributed, highly
available
and replicated (which is needed on low cost unreliable, shared nothing
hardware).

Apache OODT is a data management toolkit and data processing toolkit, that
can
interoperate and *leverage* Hadoop as one of the capabilities needed in
building
data systems. It can store data to HDFS (using the File Manager) either in
standard
ingestion and processing use cases; it can submit jobs to M/R or YARN
style workflows
and use that as the heavy lifter for the workflow processor.

In short, OODT is the code that you normally write over and over again
when building
data systems that combine Hadoop, Oracle, MySQL, WINGS, THREDDS, Condor,
and Ganglia,
GridFTP or bbFTP, etc. In other words, what you need to build an end to
end data ingestion
and processing and dissemination system. OODT makes that "glue code" very
easy to configure
and write (via XML and configuration policy/architecture) and provides a
repeatable, and
easily discernible way to build these systems.

HTH!

Cheers,
Chris




-----Original Message-----
From: Tom Barber <to...@meteorite.bi>
Reply-To: "user@oodt.apache.org" <us...@oodt.apache.org>
Date: Friday, November 1, 2013 1:09 AM
To: "user@oodt.apache.org" <us...@oodt.apache.org>
Subject: Hadoop Similarities

>
>
>
>Morning,
>
>Chris will remember a couple of years ago me asking on IRC about how OODT
>differs from Hadoop in terms of features and functionality, which he then
>gave a great page long explanation as to what the differences were. I
>vowed to copy that information off and
> save it somewhere useful, and of course never did, then I asked Sean who
>also couldn't dig it up.
>
>So, fine folks of the OODT community, for a novice like me who would be
>interested in "selling" OODT to users if the correct usecase came along,
>when someone says "Isn't OODT just a different type of Hadoop?" what do I
>answer?
>
>I'd like to document this type of comparison stuff on the Wiki as well as
>I think its useful for people to know and understand.
>
>Cheers
>
>Tom
>
>-- 
>Tom Barber | Technical Director
>
>meteorite bi
>T: +44 20 8133 3730
>W: www.meteorite.bi <http://www.meteorite.bi> |
>Skype: meteorite.consulting
>A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK