You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Bill Graham <bi...@gmail.com> on 2009/11/09 19:25:38 UTC

Initial Chukwa questions

Hi,

I'm evaluating Chukwa for a use case we have, where it seems like it should
meet our needs fairly well. After reading the docs I have a few general
questions to ask though, before going to far down the path of installing and
testing Chukwa.

- The admin guide states that Chukwa is installed on "A hadoop cluster
created specifically for Chukwa". I can see how this is the case when you're
using Chukwa to collect Hadoop logs. In my case though, I'm looking to use
Chukwa to collect access logs from our web servers. So do I still need a
Hadoop cluster _specifically_ for Chukwa, or will Chukwa play nicely with my
existing Hadoop cluster that is also used for other things?

- The admin guide also states that MySQL 5.1.30 is required. We're still
running MySQL 5.0.x. Does Chukwa require specific 5.1.30 features, or is
that just the version it's been tested against?

- I'm running Hadoop 0.18.3. Is that ok?

- Where are the Chukwa collectors typically installed? On the name node,
data node(s), or some other server(s) that are not part of the cluster? How
many are typically used for adequate redundancy?

- What release should I be using or should I build from the trunk? If the
answer is the trunk, when is the next scheduled release planned for? The
Chukwa Releases URL is broken on the admin guide btw, even after removing
the trailing characters:
http://hadoop.apache.org/chukwa/releases.html)./

Thanks for taking the time to read through these (and hopefully reply :) ).

thanks,
Bill

Re: Initial Chukwa questions

Posted by Bill Graham <bi...@gmail.com>.
Thanks Ari for the quick response!

You helped answer another question I was asking myself, which is what is
MySQL actually used for. This would be useful to clarify in the docs as
well. As I was reading I was wondering which parts of chukwa were required
for different parts of the system, and whether some could be used without
others.

I'll let you know how evaluation goes and will report back any other
feedback.

thanks,
Bill

On Mon, Nov 9, 2009 at 11:58 AM, Ariel Rabkin <as...@gmail.com> wrote:

> Howdy.
>
> Thanks for your interest in Chukwa!
>
> One warning I should put up top.  Be careful about using Chukwa on a
> production cluster. You really should make sure that the Chukwa
> control port is blocked by your firewall and that Chukwa isn't run as
> a privileged user.   To date, our biggest priority has been monitoring
> back-end Hadoop clusters, rather than interactive services, and
> therefore security hasn't been as high a priority as perhaps it should
> be. That's likely to be improved in the near future, though.
>
> To answer your questions --
>
> - Chukwa will work fine if it's set to dump data to an existing Hadoop
> clusters. That line in the admin guide is wrong and I will fix it.
> - Chukwa per se doesn't require MySQL at all. HICC, the visualization
> toolset, does require MySQL. I don't know about version requirements.
> I suspect it'll be fine with 5.0.x
> - Hadoop 18.3 should be fine as a data storage cluster.
> - I would stick the collectors on the data notes. The number you need
> depends on your data volume. Each collector can write at basically
> HDFS's single-writer saturation point. So probably just one is enough
> unless you have a huge volume of data. Add a few more for redundancy.
> - You can use either trunk or the 0.3 release. The release is in
> progress and is now propagating out to the Apache mirror network; look
> at http://www.apache.org/dyn/closer.cgi/hadoop/chukwa/ . Some mirrors
> may not have synced yet, but most of them should have the release.
>
> At the moment, trunk is the same as 0.3 except for a handful of tiny
> feature tweaks and medium-important bug fixes.
>
> --Ari
>
> --
>
> On Mon, Nov 9, 2009 at 10:25 AM, Bill Graham <bi...@gmail.com> wrote:
> > Hi,
> >
> > I'm evaluating Chukwa for a use case we have, where it seems like it
> should
> > meet our needs fairly well. After reading the docs I have a few general
> > questions to ask though, before going to far down the path of installing
> and
> > testing Chukwa.
> >
> > - The admin guide states that Chukwa is installed on "A hadoop cluster
> > created specifically for Chukwa". I can see how this is the case when
> you're
> > using Chukwa to collect Hadoop logs. In my case though, I'm looking to
> use
> > Chukwa to collect access logs from our web servers. So do I still need a
> > Hadoop cluster _specifically_ for Chukwa, or will Chukwa play nicely with
> my
> > existing Hadoop cluster that is also used for other things?
> >
> > - The admin guide also states that MySQL 5.1.30 is required. We're still
> > running MySQL 5.0.x. Does Chukwa require specific 5.1.30 features, or is
> > that just the version it's been tested against?
> >
> > - I'm running Hadoop 0.18.3. Is that ok?
> >
> > - Where are the Chukwa collectors typically installed? On the name node,
> > data node(s), or some other server(s) that are not part of the cluster?
> How
> > many are typically used for adequate redundancy?
> >
> > - What release should I be using or should I build from the trunk? If the
> > answer is the trunk, when is the next scheduled release planned for? The
> > Chukwa Releases URL is broken on the admin guide btw, even after removing
> > the trailing characters:
> > http://hadoop.apache.org/chukwa/releases.html)./<http://hadoop.apache.org/chukwa/releases.html%29./>
> >
> > Thanks for taking the time to read through these (and hopefully reply :)
> ).
> >
> > thanks,
> > Bill
> >
>
>
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>

Re: Initial Chukwa questions

Posted by Ariel Rabkin <as...@gmail.com>.
Howdy.

Thanks for your interest in Chukwa!

One warning I should put up top.  Be careful about using Chukwa on a
production cluster. You really should make sure that the Chukwa
control port is blocked by your firewall and that Chukwa isn't run as
a privileged user.   To date, our biggest priority has been monitoring
back-end Hadoop clusters, rather than interactive services, and
therefore security hasn't been as high a priority as perhaps it should
be. That's likely to be improved in the near future, though.

To answer your questions --

- Chukwa will work fine if it's set to dump data to an existing Hadoop
clusters. That line in the admin guide is wrong and I will fix it.
- Chukwa per se doesn't require MySQL at all. HICC, the visualization
toolset, does require MySQL. I don't know about version requirements.
I suspect it'll be fine with 5.0.x
- Hadoop 18.3 should be fine as a data storage cluster.
- I would stick the collectors on the data notes. The number you need
depends on your data volume. Each collector can write at basically
HDFS's single-writer saturation point. So probably just one is enough
unless you have a huge volume of data. Add a few more for redundancy.
- You can use either trunk or the 0.3 release. The release is in
progress and is now propagating out to the Apache mirror network; look
at http://www.apache.org/dyn/closer.cgi/hadoop/chukwa/ . Some mirrors
may not have synced yet, but most of them should have the release.

At the moment, trunk is the same as 0.3 except for a handful of tiny
feature tweaks and medium-important bug fixes.

--Ari

--

On Mon, Nov 9, 2009 at 10:25 AM, Bill Graham <bi...@gmail.com> wrote:
> Hi,
>
> I'm evaluating Chukwa for a use case we have, where it seems like it should
> meet our needs fairly well. After reading the docs I have a few general
> questions to ask though, before going to far down the path of installing and
> testing Chukwa.
>
> - The admin guide states that Chukwa is installed on "A hadoop cluster
> created specifically for Chukwa". I can see how this is the case when you're
> using Chukwa to collect Hadoop logs. In my case though, I'm looking to use
> Chukwa to collect access logs from our web servers. So do I still need a
> Hadoop cluster _specifically_ for Chukwa, or will Chukwa play nicely with my
> existing Hadoop cluster that is also used for other things?
>
> - The admin guide also states that MySQL 5.1.30 is required. We're still
> running MySQL 5.0.x. Does Chukwa require specific 5.1.30 features, or is
> that just the version it's been tested against?
>
> - I'm running Hadoop 0.18.3. Is that ok?
>
> - Where are the Chukwa collectors typically installed? On the name node,
> data node(s), or some other server(s) that are not part of the cluster? How
> many are typically used for adequate redundancy?
>
> - What release should I be using or should I build from the trunk? If the
> answer is the trunk, when is the next scheduled release planned for? The
> Chukwa Releases URL is broken on the admin guide btw, even after removing
> the trailing characters:
> http://hadoop.apache.org/chukwa/releases.html)./
>
> Thanks for taking the time to read through these (and hopefully reply :) ).
>
> thanks,
> Bill
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department