You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@bigtop.apache.org by Steven Núñez <st...@illation.com> on 2013/11/21 03:32:00 UTC

CentOS Out of Box Install Summary

Gents,

Below is a summary of the results of an out of the box CentOS/EC2 BigTop 0.70.0 install. It lists all the components I need for the project I’m writing about. What would be useful somewhere on the wiki is a list of known issues and a page to some possible resolutions. This could be as easy as taking this list and adding a third column ‘workaround’ with a page on how to fix it. It could also be used as a QA page of sorts, on the assumption that all of the components are supposed to work out of the box (looks like some of the init.d scripts aren’t quite right either judging by the error below).

Cheers,
- SteveN

Hadoop datanode is running                                 [  OK  ]
Hadoop journalnode is running                              [  OK  ]
Hadoop namenode is running                                 [  OK  ]
Hadoop secondarynamenode is running                        [  OK  ]
Hadoop zkfc is dead and pid file exists                    [FAILED]
Hadoop httpfs is running                                   [  OK  ]
Hadoop historyserver is dead and pid file exists           [FAILED]
Hadoop nodemanager is dead and pid file exists             [FAILED]
Hadoop proxyserver is dead and pid file exists             [FAILED]
Hadoop resourcemanager is running                          [  OK  ]
hald (pid  1041) is running...
HBase master daemon is dead and pid file exists            [FAILED]
hbase-regionserver is not running.
HBase rest daemon is running                               [  OK  ]
HBase thrift daemon is running                             [  OK  ]
HCatalog server is running                                 [  OK  ]
Hive Metastore is dead and pid file exists                 [FAILED]
Hive Server is running                                     [  OK  ]
Hive Server2 is dead and pid file exists                   [FAILED]
not running but /var/run/oozie/oozie.pid exists.
Spark master is not running                                [FAILED]
Spark worker is not running                                [FAILED]
spice-vdagentd is stopped
Sqoop Server is running                                    [  OK  ]

Re: CentOS Out of Box Install Summary

Posted by Sean Mackrory <ma...@gmail.com>.

Yes, CDH and Bigtop are very closely related, and in fact every commercial
Hadoop distro that I've looked at includes the "bigtop-utils" package.
Commercial documentation might be good in helping you understand common
practices and context, but there are some key differences and it shouldn't
dissuade us from getting some quality documentation for Bigtop anyway.

Oozie, Hue, Zookeeper and HBase are also available in Bigtop. Snappy is not
a separate component in Bigtop but I'm pretty sure it always gets built
into our Hadoop packages. We're setting the flags to do that if it's
installed at build-time, at least.

The Puppet code is in bigtop-deploy/puppet and there's a README file in
that directory. Haven't done it myself.

As a former BSD user myself, I would love to see more Hadoop support there,
but it would require some commitment from the other projects too (Hadoop
only supports Linux for production and Windows for development). Last time
I used BSD I recall seeing Hadoop and Pig in ports, actually - but they
were old versions and they're not part of Bigtop or well maintained by
anyone as far as I can tell.



On Thu, Nov 21, 2013 at 7:38 PM, Steven Núñez <st...@illation.com>wrote:

>   I think focusing on a single-node installation is probably the best bet
> at the moment. Jay gave some sound advice for practical usage: start small
> and build from there, but given that Hadoop and its ecosystem are still in
> the formative stages, there’s going to be a lot of people that want to kick
> the tires and explore the components.
>
>  Having a few well-tested recipes, the first a single-node set-up, would
> be ideal. It’s probably easier to start with a well-configured single-node
> installation and expand from there then trying to sort out both component
> configuration and the distributed aspect at the same time.
>
>  The Cloudera website has some installation instructions for Installing
> CDH4 on a Single Linux Node in Pseudo-distributed Mode<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3.html>that might be useful as a guide. At the end there’s a section Components
> That Require Additional Configuration<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_4.html>that, while not terrible helpful in the set-up, at least provides some
> pointers so that the reader knows it’s not supposed to work out of the box.
>
>  Sean, in an earlier message you wrote:
>
> Not sure what to tell you here. I regularly set up pseudo-distributed
> Hadoop installations in minutes with little more than "yum install
> hadoop-conf-pseudo", "sudo service hadoop-hdfs-namenode init" and a reboot.
> If you're using a bunch of other services on a fully-distributed cluster
> and you're completely new to this, I would expect it take hours / days to
> get everything running. Bigtop also maintains puppet code that will
> configure everything with a pretty good default configuration and have your
> cluster working pretty much out-of-the-box. Maybe this is a good option for
> you?
>
>
>  Two questions:
>
>    - Those commands are the same as in the Cloudera documentation. Are
>    those components also in the BigTop repository? I’m aware of some of the
>    yum searching commands (I’m a FreeBSD user myself — where’s my Hadoop
>    distribution? Just kidding.); is there a good way to explore/browse the
>    repository to see what’s in BigTop?
>    - Where would I find the puppet code and how would I run it? If this
>    is a good route, perhaps just documentation is all that’s needed.
>
>
>   From: Sean Mackrory <ma...@gmail.com>
> Reply-To: "user@bigtop.apache.org" <us...@bigtop.apache.org>
> Date: Thursday, 21 November 2013 23:48
> To: "user@bigtop.apache.org" <us...@bigtop.apache.org>
> Subject: Re: CentOS Out of Box Install Summary
>
>      One key point is that the components that are running out of the box
> are mostly running in a single-node configuration or with an embedded
> database as a backend. Practically all of these systems will require some
> manual configuration before they are production-ready. Neither packages nor
> puppet can solve that entirely - we would really need something that can
> orchestrate the different roles in the cluster in bringing up the services.
> Even then, I suspect such a system would require some manual input
> regarding what you want, because there are so many different ways you might
> want to deploy all this.
>
>  - Hadoop zkfc: This is for high-availability in HDFS. I don't know the
> specifics but I would not expect this to be running out-of-the-box.
>  - I don't have a ton of experience with the other Hadoop daemons but I
> know the NodeManager usually works for me. I'd be curious to know what
> problem you ran into here.
>  - We could probably make a "hbase-conf-pseudo" package that installs a
> working single-node configuration, but again - it would never be used that
> way in most cases. I thought by default the master operated in
> "stand-alone" mode, and by enabling "distributed mode" in the configuration
> you could then run a region server on the same node. See
> http://hbase.apache.org/book/standalone_dist.html.
>  - The Hive Metastore needs an external RDBMS to be configured. Some
> services come with a default "embedded" database but these are never
> suitable for production and usually cause more trouble than they are worth,
> IMHO. I love the sound of "everything working out of the box", but I think
> this is one case where we need to help the user understand what external
> infrastructure is required to make the system work properly.
>  - Not familiar with Spark, but I believe we stopped shipping Scala
> embedded in Spark and a user would need to have it installed beforehand,
> just like with Java? I'm probably wrong here - just a hint.
>
>  Thanks for sharing your emails with the list. As Jay Vyas mentioned - a
> lot of the contributors can get busy at times but it would be great to
> start collecting this information into a better "User Manual".
>
>
> On Wed, Nov 20, 2013 at 6:32 PM, Steven Núñez <st...@illation.com>wrote:
>
>>   Gents,
>>
>>  Below is a summary of the results of an out of the box CentOS/EC2
>> BigTop 0.70.0 install. It lists all the components I need for the project
>> I’m writing about. What would be useful somewhere on the wiki is a list of
>> known issues and a page to some possible resolutions. This could be as easy
>> as taking this list and adding a third column ‘workaround’ with a page on
>> how to fix it. It could also be used as a QA page of sorts, on the
>> assumption that all of the components are supposed to work out of the box
>> (looks like some of the init.d scripts aren’t quite right either judging by
>> the error below).
>>
>>  Cheers,
>> - SteveN
>>
>>  Hadoop datanode is running                                 [  OK  ]
>> Hadoop journalnode is running                              [  OK  ]
>> Hadoop namenode is running                                 [  OK  ]
>> Hadoop secondarynamenode is running                        [  OK  ]
>> Hadoop zkfc is dead and pid file exists                    [FAILED]
>> Hadoop httpfs is running                                   [  OK  ]
>> Hadoop historyserver is dead and pid file exists           [FAILED]
>> Hadoop nodemanager is dead and pid file exists             [FAILED]
>> Hadoop proxyserver is dead and pid file exists             [FAILED]
>> Hadoop resourcemanager is running                          [  OK  ]
>> hald (pid  1041) is running...
>> HBase master daemon is dead and pid file exists            [FAILED]
>> hbase-regionserver is not running.
>> HBase rest daemon is running                               [  OK  ]
>> HBase thrift daemon is running                             [  OK  ]
>> HCatalog server is running                                 [  OK  ]
>> Hive Metastore is dead and pid file exists                 [FAILED]
>> Hive Server is running                                     [  OK  ]
>> Hive Server2 is dead and pid file exists                   [FAILED]
>> not running but /var/run/oozie/oozie.pid exists.
>> Spark master is not running                                [FAILED]
>> Spark worker is not running                                [FAILED]
>> spice-vdagentd is stopped
>> Sqoop Server is running                                    [  OK  ]
>>
>>
>

Re: CentOS Out of Box Install Summary

Posted by Steven Núñez <st...@illation.com>.

I think focusing on a single-node installation is probably the best bet at the moment. Jay gave some sound advice for practical usage: start small and build from there, but given that Hadoop and its ecosystem are still in the formative stages, there’s going to be a lot of people that want to kick the tires and explore the components.

Having a few well-tested recipes, the first a single-node set-up, would be ideal. It’s probably easier to start with a well-configured single-node installation and expand from there then trying to sort out both component configuration and the distributed aspect at the same time.

The Cloudera website has some installation instructions for Installing CDH4 on a Single Linux Node in Pseudo-distributed Mode<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3.html> that might be useful as a guide. At the end there’s a section Components That Require Additional Configuration<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_4.html> that, while not terrible helpful in the set-up, at least provides some pointers so that the reader knows it’s not supposed to work out of the box.

Sean, in an earlier message you wrote:
Not sure what to tell you here. I regularly set up pseudo-distributed Hadoop installations in minutes with little more than "yum install hadoop-conf-pseudo", "sudo service hadoop-hdfs-namenode init" and a reboot. If you're using a bunch of other services on a fully-distributed cluster and you're completely new to this, I would expect it take hours / days to get everything running. Bigtop also maintains puppet code that will configure everything with a pretty good default configuration and have your cluster working pretty much out-of-the-box. Maybe this is a good option for you?

Two questions:

  *   Those commands are the same as in the Cloudera documentation. Are those components also in the BigTop repository? I’m aware of some of the yum searching commands (I’m a FreeBSD user myself — where’s my Hadoop distribution? Just kidding.); is there a good way to explore/browse the repository to see what’s in BigTop?
  *   Where would I find the puppet code and how would I run it? If this is a good route, perhaps just documentation is all that’s needed.

From: Sean Mackrory <ma...@gmail.com>>
Reply-To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Date: Thursday, 21 November 2013 23:48
To: "user@bigtop.apache.org<ma...@bigtop.apache.org>" <us...@bigtop.apache.org>>
Subject: Re: CentOS Out of Box Install Summary

One key point is that the components that are running out of the box are mostly running in a single-node configuration or with an embedded database as a backend. Practically all of these systems will require some manual configuration before they are production-ready. Neither packages nor puppet can solve that entirely - we would really need something that can orchestrate the different roles in the cluster in bringing up the services. Even then, I suspect such a system would require some manual input regarding what you want, because there are so many different ways you might want to deploy all this.

- Hadoop zkfc: This is for high-availability in HDFS. I don't know the specifics but I would not expect this to be running out-of-the-box.
- I don't have a ton of experience with the other Hadoop daemons but I know the NodeManager usually works for me. I'd be curious to know what problem you ran into here.
- We could probably make a "hbase-conf-pseudo" package that installs a working single-node configuration, but again - it would never be used that way in most cases. I thought by default the master operated in "stand-alone" mode, and by enabling "distributed mode" in the configuration you could then run a region server on the same node. See http://hbase.apache.org/book/standalone_dist.html.
- The Hive Metastore needs an external RDBMS to be configured. Some services come with a default "embedded" database but these are never suitable for production and usually cause more trouble than they are worth, IMHO. I love the sound of "everything working out of the box", but I think this is one case where we need to help the user understand what external infrastructure is required to make the system work properly.
- Not familiar with Spark, but I believe we stopped shipping Scala embedded in Spark and a user would need to have it installed beforehand, just like with Java? I'm probably wrong here - just a hint.

Thanks for sharing your emails with the list. As Jay Vyas mentioned - a lot of the contributors can get busy at times but it would be great to start collecting this information into a better "User Manual".


On Wed, Nov 20, 2013 at 6:32 PM, Steven Núñez <st...@illation.com>> wrote:
Gents,

Below is a summary of the results of an out of the box CentOS/EC2 BigTop 0.70.0 install. It lists all the components I need for the project I’m writing about. What would be useful somewhere on the wiki is a list of known issues and a page to some possible resolutions. This could be as easy as taking this list and adding a third column ‘workaround’ with a page on how to fix it. It could also be used as a QA page of sorts, on the assumption that all of the components are supposed to work out of the box (looks like some of the init.d scripts aren’t quite right either judging by the error below).

Cheers,
- SteveN

Hadoop datanode is running                                 [  OK  ]
Hadoop journalnode is running                              [  OK  ]
Hadoop namenode is running                                 [  OK  ]
Hadoop secondarynamenode is running                        [  OK  ]
Hadoop zkfc is dead and pid file exists                    [FAILED]
Hadoop httpfs is running                                   [  OK  ]
Hadoop historyserver is dead and pid file exists           [FAILED]
Hadoop nodemanager is dead and pid file exists             [FAILED]
Hadoop proxyserver is dead and pid file exists             [FAILED]
Hadoop resourcemanager is running                          [  OK  ]
hald (pid  1041) is running...
HBase master daemon is dead and pid file exists            [FAILED]
hbase-regionserver is not running.
HBase rest daemon is running                               [  OK  ]
HBase thrift daemon is running                             [  OK  ]
HCatalog server is running                                 [  OK  ]
Hive Metastore is dead and pid file exists                 [FAILED]
Hive Server is running                                     [  OK  ]
Hive Server2 is dead and pid file exists                   [FAILED]
not running but /var/run/oozie/oozie.pid exists.
Spark master is not running                                [FAILED]
Spark worker is not running                                [FAILED]
spice-vdagentd is stopped
Sqoop Server is running                                    [  OK  ]

Re: CentOS Out of Box Install Summary

Posted by Sean Mackrory <ma...@gmail.com>.

One key point is that the components that are running out of the box are
mostly running in a single-node configuration or with an embedded database
as a backend. Practically all of these systems will require some manual
configuration before they are production-ready. Neither packages nor puppet
can solve that entirely - we would really need something that can
orchestrate the different roles in the cluster in bringing up the services.
Even then, I suspect such a system would require some manual input
regarding what you want, because there are so many different ways you might
want to deploy all this.

- Hadoop zkfc: This is for high-availability in HDFS. I don't know the
specifics but I would not expect this to be running out-of-the-box.
- I don't have a ton of experience with the other Hadoop daemons but I know
the NodeManager usually works for me. I'd be curious to know what problem
you ran into here.
- We could probably make a "hbase-conf-pseudo" package that installs a
working single-node configuration, but again - it would never be used that
way in most cases. I thought by default the master operated in
"stand-alone" mode, and by enabling "distributed mode" in the configuration
you could then run a region server on the same node. See
http://hbase.apache.org/book/standalone_dist.html.
- The Hive Metastore needs an external RDBMS to be configured. Some
services come with a default "embedded" database but these are never
suitable for production and usually cause more trouble than they are worth,
IMHO. I love the sound of "everything working out of the box", but I think
this is one case where we need to help the user understand what external
infrastructure is required to make the system work properly.
- Not familiar with Spark, but I believe we stopped shipping Scala embedded
in Spark and a user would need to have it installed beforehand, just like
with Java? I'm probably wrong here - just a hint.

Thanks for sharing your emails with the list. As Jay Vyas mentioned - a lot
of the contributors can get busy at times but it would be great to start
collecting this information into a better "User Manual".


On Wed, Nov 20, 2013 at 6:32 PM, Steven Núñez <st...@illation.com>wrote:

>   Gents,
>
>  Below is a summary of the results of an out of the box CentOS/EC2 BigTop
> 0.70.0 install. It lists all the components I need for the project I’m
> writing about. What would be useful somewhere on the wiki is a list of
> known issues and a page to some possible resolutions. This could be as easy
> as taking this list and adding a third column ‘workaround’ with a page on
> how to fix it. It could also be used as a QA page of sorts, on the
> assumption that all of the components are supposed to work out of the box
> (looks like some of the init.d scripts aren’t quite right either judging by
> the error below).
>
>  Cheers,
> - SteveN
>
>  Hadoop datanode is running                                 [  OK  ]
> Hadoop journalnode is running                              [  OK  ]
> Hadoop namenode is running                                 [  OK  ]
> Hadoop secondarynamenode is running                        [  OK  ]
> Hadoop zkfc is dead and pid file exists                    [FAILED]
> Hadoop httpfs is running                                   [  OK  ]
> Hadoop historyserver is dead and pid file exists           [FAILED]
> Hadoop nodemanager is dead and pid file exists             [FAILED]
> Hadoop proxyserver is dead and pid file exists             [FAILED]
> Hadoop resourcemanager is running                          [  OK  ]
> hald (pid  1041) is running...
> HBase master daemon is dead and pid file exists            [FAILED]
> hbase-regionserver is not running.
> HBase rest daemon is running                               [  OK  ]
> HBase thrift daemon is running                             [  OK  ]
> HCatalog server is running                                 [  OK  ]
> Hive Metastore is dead and pid file exists                 [FAILED]
> Hive Server is running                                     [  OK  ]
> Hive Server2 is dead and pid file exists                   [FAILED]
> not running but /var/run/oozie/oozie.pid exists.
> Spark master is not running                                [FAILED]
> Spark worker is not running                                [FAILED]
> spice-vdagentd is stopped
> Sqoop Server is running                                    [  OK  ]
>
>