You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@bigtop.apache.org by Steven Núñez <st...@illation.com> on 2013/12/30 17:32:18 UTC

Cluster Management: OpenSource & Vendor Options

Seasons Greetings All,

I’m doing a bit of a write-up on the various Hadoop distributions and would like to understand exactly what packages are installed by the Apache version of Ambari. It’s an exciting place to be working (big data & Hadoop) but the lines are blurred in many ways. The way I see the open source landscape now is something like this (from a management/installation/configuration perspective):

BigTop -> RPM like packaging for Hadoop
Ambari -> GUI management/monitoring/provisioning

Looking at it from a vendor perspective, we’ve got (I know there are others, this is just for discussion):

BigTop (packaging)
CDH
HDP
Apache Bigtop

Cloudera
Cloudera Manager (closed source, commercial)

Hortonworks / Apache
Ambari (open source)

The CDH, BigTop and HDP (I assume) base distributions require a lot of manual configuration, so the best way to spin up a cluster with a reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.

Is there an equivalent for Apache? If I use the kit found at ambari.apache.org to spin up a cluster, do I get Apache components, or the HDP distribution? I’m trying to define the ‘Apache distribution’ in my mind, if there is one, and understand exactly what its capabilities are, and cluster management is rather fundamental, since not many folks have the luxury of spending time climbing the long, steep learning curve of Hadoop ecosystem configuration.

Cheers,
- SteveN



Fwd: Cluster Management: OpenSource & Vendor Options

Posted by Roman Shaposhnik <sh...@gmail.com>.
On Tue, Dec 31, 2013 at 7:10 AM, Steven Núñez <st...@illation.com> wrote:
> Thanks. That issue answers pretty much all the questions. I’d certainly give
> it a +1 if I had a login. That definitely seems like the right direction to
> move in. I don’t know the internals, but if everyone is using BigTop for
> packaging, perhaps there’s some way to read the manifest files (if that’s
> what they’re called) to produce what Ambari needs for management.

>From the Bigtop side of things -- I'd love to see a better integration between
Ambari and Bigtop.

As a datapoint -- I poked around HDP2 (the distro that Ambari seems to
support in the best possible way) and it looked very similar to Bigtop
in layout and everything else. Perhaps it won't be that big of a deal to
adopt Ambari to support Bigtop distro natively. That would have an
added benefit of all the Bigtop-derived distros (Cloudera, Hortonworks,
Intel, Pivotal, WANDisco) getting the baseline support for free.

I'd love to help from Bigtop side of things, but my Ambari foo is weak
enough to request somebody from the Ambari team of developers
to help.

So... if there's enough interest, perhaps we can find a way?

Thanks,
Roman.

Re: Cluster Management: OpenSource & Vendor Options

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Tue, Dec 31, 2013 at 7:10 AM, Steven Núñez <st...@illation.com> wrote:
> Thanks. That issue answers pretty much all the questions. I’d certainly give
> it a +1 if I had a login. That definitely seems like the right direction to
> move in. I don’t know the internals, but if everyone is using BigTop for
> packaging, perhaps there’s some way to read the manifest files (if that’s
> what they’re called) to produce what Ambari needs for management.

>From the Bigtop side of things -- I'd love to see a better integration between
Ambari and Bigtop.

As a datapoint -- I poked around HDP2 (the distro that Ambari seems to
support in the best possible way) and it looked very similar to Bigtop
in layout and everything else. Perhaps it won't be that big of a deal to
adopt Ambari to support Bigtop distro natively. That would have an
added benefit of all the Bigtop-derived distros (Cloudera, Hortonworks,
Intel, Pivotal, WANDisco) getting the baseline support for free.

I'd love to help from Bigtop side of things, but my Ambari foo is weak
enough to request somebody from the Ambari team of developers
to help.

So... if there's enough interest, perhaps we can find a way?

Thanks,
Roman.

Re: Cluster Management: OpenSource & Vendor Options

Posted by Steven Núñez <st...@illation.com>.
I can¹t comment on the direction of Ambari, nor CM¹s internals, but a good
set of Hadoop command-line management tools, based on puppet, chef or
salt, would certainly be a good addition to the community, regardless of
BigTop supporting Ambari. Especially since it allows easy integration into
existing data centre administration & operations processes.

If sufficiently mature, a thin GUI shouldn¹t be that hard to add for
common operations. The question is: where does this fit? In BigTop as a
sub-project? A project in and of itself? It¹s not a trivial amount of
work, and Ambari might just be the Œgood enough¹ path of least resistance;
at least being open source some hooks could be added to support
orchestration tools.

	- SteveN



On 2013-12-31 14:21 , "Konstantin Boudnik" <co...@apache.org> wrote:

>The reason for not-much activity on that JIRA is because Ambari seems to
>be
>drifting away from real-life orchestration systems like Puppet toward
>something else. And that's exactly why I prefer to use Puppet or Chef
>orchestration - you have a state machine that works in the same way on
>every
>supported platform.
>
>In 20 years of doing system and network administration as well as software
>development I've seen times and again how the fancy UI applications fail
>to
>deliver on their promise. Main reason is in the shifting of the focus on
>the
>bling instead of the core functionality.
>
>Cloudera's CM is a perfect example of my point, because it is doing
>totally
>heinous things with standard Linux services, their life-cycle and
>configurations up to the point where any sane Hadoop devops would be
>helpless
>to do anything without CM. Ambrai seems to be a bit better in this
>respect.
>However, with the development above and replacement of the Puppet I am not
>sure how much longer it will be the case.
>
>Happy New Year everyone!
>  Cos
>
>On Tue, Dec 31, 2013 at 03:10PM, Steven Núñez wrote:
>> Thanks. That issue answers pretty much all the questions. I¹d certainly
>>give
>> it a +1 if I had a login. That definitely seems like the right
>>direction to
>> move in. I don¹t know the internals, but if everyone is using BigTop for
>> packaging, perhaps there¹s some way to read the manifest files (if
>>that¹s
>> what they¹re called) to produce what Ambari needs for management.
>> 
>> 
>> From: Chris Mildebrandt
>><ch...@woodenrhino.com>>
>> Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>"
>><us...@bigtop.apache.org>>
>> Date: Tuesday, 31 December 2013 7:57
>> To: "user@ambari.apache.org<ma...@ambari.apache.org>"
>><us...@ambari.apache.org>>
>> Cc: "user@bigtop.apache.org<ma...@bigtop.apache.org>"
>><us...@bigtop.apache.org>>
>> Subject: Re: Cluster Management: OpenSource & Vendor Options
>> 
>> You may want to watch this:
>>https://issues.apache.org/jira/browse/AMBARI-3524
>> 
>> and include it in your write-up for future considerations. Though there
>>hasn't been much activity on it.
>> 
>> On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez
>><st...@illation.com>> wrote:
>> Seasons Greetings All,
>> 
>> I¹m doing a bit of a write-up on the various Hadoop distributions and
>>would like to understand exactly what packages are installed by the
>>Apache version of Ambari. It¹s an exciting place to be working (big data
>>& Hadoop) but the lines are blurred in many ways. The way I see the open
>>source landscape now is something like this (from a
>>management/installation/configuration perspective):
>> 
>> BigTop -> RPM like packaging for Hadoop
>> Ambari -> GUI management/monitoring/provisioning
>> 
>> Looking at it from a vendor perspective, we¹ve got (I know there are
>>others, this is just for discussion):
>> 
>> BigTop (packaging)
>> CDH
>> HDP
>> Apache Bigtop
>> 
>> Cloudera
>> Cloudera Manager (closed source, commercial)
>> 
>> Hortonworks / Apache
>> Ambari(open source)
>> 
>> The CDH, BigTop and HDP (I assume) base distributions require a lot of
>>manual configuration, so the best way to spin up a cluster with a
>>reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase,
>>ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
>> 
>> Is there an equivalent for Apache? If I use the kit found at
>>ambari.apache.org<http://ambari.apache.org> to spin up a cluster, do I
>>get Apache components, or the HDP distribution? I¹m trying to define the
>>ŒApache distribution¹ in my mind, if there is one, and understand
>>exactly what its capabilities are, and cluster management is rather
>>fundamental, since not many folks have the luxury of spending time
>>climbing the long, steep learning curve of Hadoop ecosystem
>>configuration.
>> 
>> Cheers,
>> - SteveN
>> 
>> 
>> 


Re: Cluster Management: OpenSource & Vendor Options

Posted by Steven Núñez <st...@illation.com>.
I can¹t comment on the direction of Ambari, nor CM¹s internals, but a good
set of Hadoop command-line management tools, based on puppet, chef or
salt, would certainly be a good addition to the community, regardless of
BigTop supporting Ambari. Especially since it allows easy integration into
existing data centre administration & operations processes.

If sufficiently mature, a thin GUI shouldn¹t be that hard to add for
common operations. The question is: where does this fit? In BigTop as a
sub-project? A project in and of itself? It¹s not a trivial amount of
work, and Ambari might just be the Œgood enough¹ path of least resistance;
at least being open source some hooks could be added to support
orchestration tools.

	- SteveN



On 2013-12-31 14:21 , "Konstantin Boudnik" <co...@apache.org> wrote:

>The reason for not-much activity on that JIRA is because Ambari seems to
>be
>drifting away from real-life orchestration systems like Puppet toward
>something else. And that's exactly why I prefer to use Puppet or Chef
>orchestration - you have a state machine that works in the same way on
>every
>supported platform.
>
>In 20 years of doing system and network administration as well as software
>development I've seen times and again how the fancy UI applications fail
>to
>deliver on their promise. Main reason is in the shifting of the focus on
>the
>bling instead of the core functionality.
>
>Cloudera's CM is a perfect example of my point, because it is doing
>totally
>heinous things with standard Linux services, their life-cycle and
>configurations up to the point where any sane Hadoop devops would be
>helpless
>to do anything without CM. Ambrai seems to be a bit better in this
>respect.
>However, with the development above and replacement of the Puppet I am not
>sure how much longer it will be the case.
>
>Happy New Year everyone!
>  Cos
>
>On Tue, Dec 31, 2013 at 03:10PM, Steven Núñez wrote:
>> Thanks. That issue answers pretty much all the questions. I¹d certainly
>>give
>> it a +1 if I had a login. That definitely seems like the right
>>direction to
>> move in. I don¹t know the internals, but if everyone is using BigTop for
>> packaging, perhaps there¹s some way to read the manifest files (if
>>that¹s
>> what they¹re called) to produce what Ambari needs for management.
>> 
>> 
>> From: Chris Mildebrandt
>><ch...@woodenrhino.com>>
>> Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>"
>><us...@bigtop.apache.org>>
>> Date: Tuesday, 31 December 2013 7:57
>> To: "user@ambari.apache.org<ma...@ambari.apache.org>"
>><us...@ambari.apache.org>>
>> Cc: "user@bigtop.apache.org<ma...@bigtop.apache.org>"
>><us...@bigtop.apache.org>>
>> Subject: Re: Cluster Management: OpenSource & Vendor Options
>> 
>> You may want to watch this:
>>https://issues.apache.org/jira/browse/AMBARI-3524
>> 
>> and include it in your write-up for future considerations. Though there
>>hasn't been much activity on it.
>> 
>> On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez
>><st...@illation.com>> wrote:
>> Seasons Greetings All,
>> 
>> I¹m doing a bit of a write-up on the various Hadoop distributions and
>>would like to understand exactly what packages are installed by the
>>Apache version of Ambari. It¹s an exciting place to be working (big data
>>& Hadoop) but the lines are blurred in many ways. The way I see the open
>>source landscape now is something like this (from a
>>management/installation/configuration perspective):
>> 
>> BigTop -> RPM like packaging for Hadoop
>> Ambari -> GUI management/monitoring/provisioning
>> 
>> Looking at it from a vendor perspective, we¹ve got (I know there are
>>others, this is just for discussion):
>> 
>> BigTop (packaging)
>> CDH
>> HDP
>> Apache Bigtop
>> 
>> Cloudera
>> Cloudera Manager (closed source, commercial)
>> 
>> Hortonworks / Apache
>> Ambari(open source)
>> 
>> The CDH, BigTop and HDP (I assume) base distributions require a lot of
>>manual configuration, so the best way to spin up a cluster with a
>>reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase,
>>ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
>> 
>> Is there an equivalent for Apache? If I use the kit found at
>>ambari.apache.org<http://ambari.apache.org> to spin up a cluster, do I
>>get Apache components, or the HDP distribution? I¹m trying to define the
>>ŒApache distribution¹ in my mind, if there is one, and understand
>>exactly what its capabilities are, and cluster management is rather
>>fundamental, since not many folks have the luxury of spending time
>>climbing the long, steep learning curve of Hadoop ecosystem
>>configuration.
>> 
>> Cheers,
>> - SteveN
>> 
>> 
>> 


Re: Cluster Management: OpenSource & Vendor Options

Posted by Konstantin Boudnik <co...@apache.org>.
The reason for not-much activity on that JIRA is because Ambari seems to be
drifting away from real-life orchestration systems like Puppet toward
something else. And that's exactly why I prefer to use Puppet or Chef
orchestration - you have a state machine that works in the same way on every
supported platform.

In 20 years of doing system and network administration as well as software
development I've seen times and again how the fancy UI applications fail to
deliver on their promise. Main reason is in the shifting of the focus on the
bling instead of the core functionality. 

Cloudera's CM is a perfect example of my point, because it is doing totally
heinous things with standard Linux services, their life-cycle and
configurations up to the point where any sane Hadoop devops would be helpless
to do anything without CM. Ambrai seems to be a bit better in this respect.
However, with the development above and replacement of the Puppet I am not
sure how much longer it will be the case.

Happy New Year everyone!
  Cos

On Tue, Dec 31, 2013 at 03:10PM, Steven Núñez wrote:
> Thanks. That issue answers pretty much all the questions. I’d certainly give
> it a +1 if I had a login. That definitely seems like the right direction to
> move in. I don’t know the internals, but if everyone is using BigTop for
> packaging, perhaps there’s some way to read the manifest files (if that’s
> what they’re called) to produce what Ambari needs for management.
> 
> 
> From: Chris Mildebrandt <ch...@woodenrhino.com>>
> Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
> Date: Tuesday, 31 December 2013 7:57
> To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
> Cc: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
> Subject: Re: Cluster Management: OpenSource & Vendor Options
> 
> You may want to watch this: https://issues.apache.org/jira/browse/AMBARI-3524
> 
> and include it in your write-up for future considerations. Though there hasn't been much activity on it.
> 
> On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez <st...@illation.com>> wrote:
> Seasons Greetings All,
> 
> I’m doing a bit of a write-up on the various Hadoop distributions and would like to understand exactly what packages are installed by the Apache version of Ambari. It’s an exciting place to be working (big data & Hadoop) but the lines are blurred in many ways. The way I see the open source landscape now is something like this (from a management/installation/configuration perspective):
> 
> BigTop -> RPM like packaging for Hadoop
> Ambari -> GUI management/monitoring/provisioning
> 
> Looking at it from a vendor perspective, we’ve got (I know there are others, this is just for discussion):
> 
> BigTop (packaging)
> CDH
> HDP
> Apache Bigtop
> 
> Cloudera
> Cloudera Manager (closed source, commercial)
> 
> Hortonworks / Apache
> Ambari(open source)
> 
> The CDH, BigTop and HDP (I assume) base distributions require a lot of manual configuration, so the best way to spin up a cluster with a reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
> 
> Is there an equivalent for Apache? If I use the kit found at ambari.apache.org<http://ambari.apache.org> to spin up a cluster, do I get Apache components, or the HDP distribution? I’m trying to define the ‘Apache distribution’ in my mind, if there is one, and understand exactly what its capabilities are, and cluster management is rather fundamental, since not many folks have the luxury of spending time climbing the long, steep learning curve of Hadoop ecosystem configuration.
> 
> Cheers,
> - SteveN
> 
> 
> 

Re: Cluster Management: OpenSource & Vendor Options

Posted by Konstantin Boudnik <co...@apache.org>.
The reason for not-much activity on that JIRA is because Ambari seems to be
drifting away from real-life orchestration systems like Puppet toward
something else. And that's exactly why I prefer to use Puppet or Chef
orchestration - you have a state machine that works in the same way on every
supported platform.

In 20 years of doing system and network administration as well as software
development I've seen times and again how the fancy UI applications fail to
deliver on their promise. Main reason is in the shifting of the focus on the
bling instead of the core functionality. 

Cloudera's CM is a perfect example of my point, because it is doing totally
heinous things with standard Linux services, their life-cycle and
configurations up to the point where any sane Hadoop devops would be helpless
to do anything without CM. Ambrai seems to be a bit better in this respect.
However, with the development above and replacement of the Puppet I am not
sure how much longer it will be the case.

Happy New Year everyone!
  Cos

On Tue, Dec 31, 2013 at 03:10PM, Steven Núñez wrote:
> Thanks. That issue answers pretty much all the questions. I’d certainly give
> it a +1 if I had a login. That definitely seems like the right direction to
> move in. I don’t know the internals, but if everyone is using BigTop for
> packaging, perhaps there’s some way to read the manifest files (if that’s
> what they’re called) to produce what Ambari needs for management.
> 
> 
> From: Chris Mildebrandt <ch...@woodenrhino.com>>
> Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
> Date: Tuesday, 31 December 2013 7:57
> To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
> Cc: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
> Subject: Re: Cluster Management: OpenSource & Vendor Options
> 
> You may want to watch this: https://issues.apache.org/jira/browse/AMBARI-3524
> 
> and include it in your write-up for future considerations. Though there hasn't been much activity on it.
> 
> On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez <st...@illation.com>> wrote:
> Seasons Greetings All,
> 
> I’m doing a bit of a write-up on the various Hadoop distributions and would like to understand exactly what packages are installed by the Apache version of Ambari. It’s an exciting place to be working (big data & Hadoop) but the lines are blurred in many ways. The way I see the open source landscape now is something like this (from a management/installation/configuration perspective):
> 
> BigTop -> RPM like packaging for Hadoop
> Ambari -> GUI management/monitoring/provisioning
> 
> Looking at it from a vendor perspective, we’ve got (I know there are others, this is just for discussion):
> 
> BigTop (packaging)
> CDH
> HDP
> Apache Bigtop
> 
> Cloudera
> Cloudera Manager (closed source, commercial)
> 
> Hortonworks / Apache
> Ambari(open source)
> 
> The CDH, BigTop and HDP (I assume) base distributions require a lot of manual configuration, so the best way to spin up a cluster with a reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
> 
> Is there an equivalent for Apache? If I use the kit found at ambari.apache.org<http://ambari.apache.org> to spin up a cluster, do I get Apache components, or the HDP distribution? I’m trying to define the ‘Apache distribution’ in my mind, if there is one, and understand exactly what its capabilities are, and cluster management is rather fundamental, since not many folks have the luxury of spending time climbing the long, steep learning curve of Hadoop ecosystem configuration.
> 
> Cheers,
> - SteveN
> 
> 
> 

Re: Cluster Management: OpenSource & Vendor Options

Posted by Steven Núñez <st...@illation.com>.
Thanks. That issue answers pretty much all the questions. I’d certainly give it a +1 if I had a login. That definitely seems like the right direction to move in. I don’t know the internals, but if everyone is using BigTop for packaging, perhaps there’s some way to read the manifest files (if that’s what they’re called) to produce what Ambari needs for management.


From: Chris Mildebrandt <ch...@woodenrhino.com>>
Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Date: Tuesday, 31 December 2013 7:57
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Cc: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Subject: Re: Cluster Management: OpenSource & Vendor Options

You may want to watch this: https://issues.apache.org/jira/browse/AMBARI-3524

and include it in your write-up for future considerations. Though there hasn't been much activity on it.

On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez <st...@illation.com>> wrote:
Seasons Greetings All,

I’m doing a bit of a write-up on the various Hadoop distributions and would like to understand exactly what packages are installed by the Apache version of Ambari. It’s an exciting place to be working (big data & Hadoop) but the lines are blurred in many ways. The way I see the open source landscape now is something like this (from a management/installation/configuration perspective):

BigTop -> RPM like packaging for Hadoop
Ambari -> GUI management/monitoring/provisioning

Looking at it from a vendor perspective, we’ve got (I know there are others, this is just for discussion):

BigTop (packaging)
CDH
HDP
Apache Bigtop

Cloudera
Cloudera Manager (closed source, commercial)

Hortonworks / Apache
Ambari(open source)

The CDH, BigTop and HDP (I assume) base distributions require a lot of manual configuration, so the best way to spin up a cluster with a reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.

Is there an equivalent for Apache? If I use the kit found at ambari.apache.org<http://ambari.apache.org> to spin up a cluster, do I get Apache components, or the HDP distribution? I’m trying to define the ‘Apache distribution’ in my mind, if there is one, and understand exactly what its capabilities are, and cluster management is rather fundamental, since not many folks have the luxury of spending time climbing the long, steep learning curve of Hadoop ecosystem configuration.

Cheers,
- SteveN




Re: Cluster Management: OpenSource & Vendor Options

Posted by Steven Núñez <st...@illation.com>.
Thanks. That issue answers pretty much all the questions. I’d certainly give it a +1 if I had a login. That definitely seems like the right direction to move in. I don’t know the internals, but if everyone is using BigTop for packaging, perhaps there’s some way to read the manifest files (if that’s what they’re called) to produce what Ambari needs for management.


From: Chris Mildebrandt <ch...@woodenrhino.com>>
Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Date: Tuesday, 31 December 2013 7:57
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Cc: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Subject: Re: Cluster Management: OpenSource & Vendor Options

You may want to watch this: https://issues.apache.org/jira/browse/AMBARI-3524

and include it in your write-up for future considerations. Though there hasn't been much activity on it.

On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez <st...@illation.com>> wrote:
Seasons Greetings All,

I’m doing a bit of a write-up on the various Hadoop distributions and would like to understand exactly what packages are installed by the Apache version of Ambari. It’s an exciting place to be working (big data & Hadoop) but the lines are blurred in many ways. The way I see the open source landscape now is something like this (from a management/installation/configuration perspective):

BigTop -> RPM like packaging for Hadoop
Ambari -> GUI management/monitoring/provisioning

Looking at it from a vendor perspective, we’ve got (I know there are others, this is just for discussion):

BigTop (packaging)
CDH
HDP
Apache Bigtop

Cloudera
Cloudera Manager (closed source, commercial)

Hortonworks / Apache
Ambari(open source)

The CDH, BigTop and HDP (I assume) base distributions require a lot of manual configuration, so the best way to spin up a cluster with a reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.

Is there an equivalent for Apache? If I use the kit found at ambari.apache.org<http://ambari.apache.org> to spin up a cluster, do I get Apache components, or the HDP distribution? I’m trying to define the ‘Apache distribution’ in my mind, if there is one, and understand exactly what its capabilities are, and cluster management is rather fundamental, since not many folks have the luxury of spending time climbing the long, steep learning curve of Hadoop ecosystem configuration.

Cheers,
- SteveN




Re: Cluster Management: OpenSource & Vendor Options

Posted by Chris Mildebrandt <ch...@woodenrhino.com>.
You may want to watch this:
https://issues.apache.org/jira/browse/AMBARI-3524

and include it in your write-up for future considerations. Though there
hasn't been much activity on it.

On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez <st...@illation.com>wrote:

>   Seasons Greetings All,
>
>  I’m doing a bit of a write-up on the various Hadoop distributions and
> would like to understand exactly what packages are installed by the Apache
> version of Ambari. It’s an exciting place to be working (big data & Hadoop)
> but the lines are blurred in many ways. The way I see the open source
> landscape now is something like this (from a
> management/installation/configuration perspective):
>
>  BigTop -> RPM like packaging for Hadoop
> Ambari -> GUI management/monitoring/provisioning
>
>  Looking at it from a vendor perspective, we’ve got (I know there are
> others, this is just for discussion):
>
>  BigTop (packaging)
> CDH
> HDP
> Apache Bigtop
>
>  Cloudera
> Cloudera Manager (closed source, commercial)
>
>  Hortonworks / Apache
> Ambari (open source)
>
>  The CDH, BigTop and HDP (I assume) base distributions require a lot of
> manual configuration, so the best way to spin up a cluster with a
> reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase,
> ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
>
>  Is there an equivalent for Apache? If I use the kit found at
> ambari.apache.org to spin up a cluster, do I get Apache components, or
> the HDP distribution? I’m trying to define the ‘Apache distribution’ in my
> mind, if there is one, and understand exactly what its capabilities are,
> and cluster management is rather fundamental, since not many folks have the
> luxury of spending time climbing the long, steep learning curve of Hadoop
> ecosystem configuration.
>
>  Cheers,
> - SteveN
>
>
>

Re: Cluster Management: OpenSource & Vendor Options

Posted by Chris Mildebrandt <ch...@woodenrhino.com>.
You may want to watch this:
https://issues.apache.org/jira/browse/AMBARI-3524

and include it in your write-up for future considerations. Though there
hasn't been much activity on it.

On Mon, Dec 30, 2013 at 8:32 AM, Steven Núñez <st...@illation.com>wrote:

>   Seasons Greetings All,
>
>  I’m doing a bit of a write-up on the various Hadoop distributions and
> would like to understand exactly what packages are installed by the Apache
> version of Ambari. It’s an exciting place to be working (big data & Hadoop)
> but the lines are blurred in many ways. The way I see the open source
> landscape now is something like this (from a
> management/installation/configuration perspective):
>
>  BigTop -> RPM like packaging for Hadoop
> Ambari -> GUI management/monitoring/provisioning
>
>  Looking at it from a vendor perspective, we’ve got (I know there are
> others, this is just for discussion):
>
>  BigTop (packaging)
> CDH
> HDP
> Apache Bigtop
>
>  Cloudera
> Cloudera Manager (closed source, commercial)
>
>  Hortonworks / Apache
> Ambari (open source)
>
>  The CDH, BigTop and HDP (I assume) base distributions require a lot of
> manual configuration, so the best way to spin up a cluster with a
> reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase,
> ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
>
>  Is there an equivalent for Apache? If I use the kit found at
> ambari.apache.org to spin up a cluster, do I get Apache components, or
> the HDP distribution? I’m trying to define the ‘Apache distribution’ in my
> mind, if there is one, and understand exactly what its capabilities are,
> and cluster management is rather fundamental, since not many folks have the
> luxury of spending time climbing the long, steep learning curve of Hadoop
> ecosystem configuration.
>
>  Cheers,
> - SteveN
>
>
>

Re: Cluster Management: OpenSource & Vendor Options

Posted by Bruno Mahé <bm...@apache.org>.
On 12/30/2013 08:32 AM, Steven Núñez wrote:
>
> The CDH, BigTop and HDP (I assume) base distributions require a lot of
> manual configuration, so the best way to spin up a cluster with a
> reasonable set of applications (say HDFS, YARN, Hive, HCatalog, HBase,
> ZooKeeper, Oozie, Pig, Sqoop) is to use CDH + CM or Ambari + HDP.
>

Some people have also automated this through tools such as Puppet, Chef 
or Ansible.


Thanks,
Bruno