You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@eagle.apache.org by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com> on 2015/12/14 19:58:12 UTC

[Discuss] Hadoop metrics,job,GC monitoring

Hi Eagle devs/users,

As proposed in apache eagle incubator proposal, Eagle will start design/dev to support Hadoop system monitoring besides security monitoring which includes Hadoop native metrics, job, gclog etc.

The community is also interested in Hadoop system monitoring by Eagle when we recently talked about Eagle product in public conferences, meet up etc.

Take Hadoop native metrics as an example, first of all those metrics are pretty valuable in determining system health status, secondly collecting huge amount metrics, visualizing, and alerting is very challenging.  We need think of declarative collection, dynamic aggregation, metric storage, metric query engine etc.

Besides technical design, comprehensive policy/rule are also valuable to be shared in the community. Those policy/rule represent best practice in the world to manage large Hadoop clusters.

Please suggest whatever is for engineering design or business policy/rules.

Thanks
Edward

Re: [Discuss] Hadoop metrics,job,GC monitoring

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.

please review latest design of monitoring on hadoop native metrics.

https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Monit
oring


Thanks
Edward

On 12/14/15, 23:48, "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com> wrote:

>started some documentation on
>https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Moni
>t
>oring
>
>Thanks Hao, Ralph etc. for offline review and suggestions, I would improve
>that.
>
>In terms of the question ³if user adds a new metric to monitor, how
>processing layer would change accordingly²
>
>I think if user adds a new metric, this metric should be added into
>metadata table, and data source layer and processing layer should see
>consistent list of metrics.
>
>But we still need bake this design, please comment whatever is your
>thoughts.
>
>Thanks
>Edward
>
>
>On 12/14/15, 11:04, "Arun Manoharan" <ar...@apache.org> wrote:
>
>>Thanks Edward for starting the thread. I think it is important to have
>>the
>>job monitoring (MR/Spark) workloads for performance of the cluster and
>>availability.
>>
>>But it will be beneficial to have an extensible framework where users can
>>create business rules like "I want an alert when NN is in safemode or RM
>>is
>>flipping etc".
>>
>>Thanks,
>>Arun
>>
>>On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
>>yonzhang@ebay.com> wrote:
>>
>>> Hi Eagle devs/users,
>>>
>>> As proposed in apache eagle incubator proposal, Eagle will start
>>> design/dev to support Hadoop system monitoring besides security
>>>monitoring
>>> which includes Hadoop native metrics, job, gclog etc.
>>>
>>> The community is also interested in Hadoop system monitoring by Eagle
>>>when
>>> we recently talked about Eagle product in public conferences, meet up
>>>etc.
>>>
>>> Take Hadoop native metrics as an example, first of all those metrics
>>>are
>>> pretty valuable in determining system health status, secondly
>>>collecting
>>> huge amount metrics, visualizing, and alerting is very challenging.  We
>>> need think of declarative collection, dynamic aggregation, metric
>>>storage,
>>> metric query engine etc.
>>>
>>> Besides technical design, comprehensive policy/rule are also valuable
>>>to
>>> be shared in the community. Those policy/rule represent best practice
>>>in
>>> the world to manage large Hadoop clusters.
>>>
>>> Please suggest whatever is for engineering design or business
>>>policy/rules.
>>>
>>> Thanks
>>> Edward
>>>
>>>
>

Re: [Discuss] Hadoop metrics,job,GC monitoring

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.

please review latest design of monitoring on hadoop native metrics.

https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Monit
oring


Thanks
Edward

On 12/14/15, 23:48, "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com> wrote:

>started some documentation on
>https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Moni
>t
>oring
>
>Thanks Hao, Ralph etc. for offline review and suggestions, I would improve
>that.
>
>In terms of the question ³if user adds a new metric to monitor, how
>processing layer would change accordingly²
>
>I think if user adds a new metric, this metric should be added into
>metadata table, and data source layer and processing layer should see
>consistent list of metrics.
>
>But we still need bake this design, please comment whatever is your
>thoughts.
>
>Thanks
>Edward
>
>
>On 12/14/15, 11:04, "Arun Manoharan" <ar...@apache.org> wrote:
>
>>Thanks Edward for starting the thread. I think it is important to have
>>the
>>job monitoring (MR/Spark) workloads for performance of the cluster and
>>availability.
>>
>>But it will be beneficial to have an extensible framework where users can
>>create business rules like "I want an alert when NN is in safemode or RM
>>is
>>flipping etc".
>>
>>Thanks,
>>Arun
>>
>>On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
>>yonzhang@ebay.com> wrote:
>>
>>> Hi Eagle devs/users,
>>>
>>> As proposed in apache eagle incubator proposal, Eagle will start
>>> design/dev to support Hadoop system monitoring besides security
>>>monitoring
>>> which includes Hadoop native metrics, job, gclog etc.
>>>
>>> The community is also interested in Hadoop system monitoring by Eagle
>>>when
>>> we recently talked about Eagle product in public conferences, meet up
>>>etc.
>>>
>>> Take Hadoop native metrics as an example, first of all those metrics
>>>are
>>> pretty valuable in determining system health status, secondly
>>>collecting
>>> huge amount metrics, visualizing, and alerting is very challenging.  We
>>> need think of declarative collection, dynamic aggregation, metric
>>>storage,
>>> metric query engine etc.
>>>
>>> Besides technical design, comprehensive policy/rule are also valuable
>>>to
>>> be shared in the community. Those policy/rule represent best practice
>>>in
>>> the world to manage large Hadoop clusters.
>>>
>>> Please suggest whatever is for engineering design or business
>>>policy/rules.
>>>
>>> Thanks
>>> Edward
>>>
>>>
>

Re: [Discuss] Hadoop metrics,job,GC monitoring

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.

started some documentation on
https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Monit
oring

Thanks Hao, Ralph etc. for offline review and suggestions, I would improve
that.

In terms of the question ³if user adds a new metric to monitor, how
processing layer would change accordingly²

I think if user adds a new metric, this metric should be added into
metadata table, and data source layer and processing layer should see
consistent list of metrics.

But we still need bake this design, please comment whatever is your
thoughts.

Thanks
Edward


On 12/14/15, 11:04, "Arun Manoharan" <ar...@apache.org> wrote:

>Thanks Edward for starting the thread. I think it is important to have the
>job monitoring (MR/Spark) workloads for performance of the cluster and
>availability.
>
>But it will be beneficial to have an extensible framework where users can
>create business rules like "I want an alert when NN is in safemode or RM
>is
>flipping etc".
>
>Thanks,
>Arun
>
>On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
>yonzhang@ebay.com> wrote:
>
>> Hi Eagle devs/users,
>>
>> As proposed in apache eagle incubator proposal, Eagle will start
>> design/dev to support Hadoop system monitoring besides security
>>monitoring
>> which includes Hadoop native metrics, job, gclog etc.
>>
>> The community is also interested in Hadoop system monitoring by Eagle
>>when
>> we recently talked about Eagle product in public conferences, meet up
>>etc.
>>
>> Take Hadoop native metrics as an example, first of all those metrics are
>> pretty valuable in determining system health status, secondly collecting
>> huge amount metrics, visualizing, and alerting is very challenging.  We
>> need think of declarative collection, dynamic aggregation, metric
>>storage,
>> metric query engine etc.
>>
>> Besides technical design, comprehensive policy/rule are also valuable to
>> be shared in the community. Those policy/rule represent best practice in
>> the world to manage large Hadoop clusters.
>>
>> Please suggest whatever is for engineering design or business
>>policy/rules.
>>
>> Thanks
>> Edward
>>
>>

Re: [Discuss] Hadoop metrics,job,GC monitoring

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.

started some documentation on
https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Monit
oring

Thanks Hao, Ralph etc. for offline review and suggestions, I would improve
that.

In terms of the question ³if user adds a new metric to monitor, how
processing layer would change accordingly²

I think if user adds a new metric, this metric should be added into
metadata table, and data source layer and processing layer should see
consistent list of metrics.

But we still need bake this design, please comment whatever is your
thoughts.

Thanks
Edward


On 12/14/15, 11:04, "Arun Manoharan" <ar...@apache.org> wrote:

>Thanks Edward for starting the thread. I think it is important to have the
>job monitoring (MR/Spark) workloads for performance of the cluster and
>availability.
>
>But it will be beneficial to have an extensible framework where users can
>create business rules like "I want an alert when NN is in safemode or RM
>is
>flipping etc".
>
>Thanks,
>Arun
>
>On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
>yonzhang@ebay.com> wrote:
>
>> Hi Eagle devs/users,
>>
>> As proposed in apache eagle incubator proposal, Eagle will start
>> design/dev to support Hadoop system monitoring besides security
>>monitoring
>> which includes Hadoop native metrics, job, gclog etc.
>>
>> The community is also interested in Hadoop system monitoring by Eagle
>>when
>> we recently talked about Eagle product in public conferences, meet up
>>etc.
>>
>> Take Hadoop native metrics as an example, first of all those metrics are
>> pretty valuable in determining system health status, secondly collecting
>> huge amount metrics, visualizing, and alerting is very challenging.  We
>> need think of declarative collection, dynamic aggregation, metric
>>storage,
>> metric query engine etc.
>>
>> Besides technical design, comprehensive policy/rule are also valuable to
>> be shared in the community. Those policy/rule represent best practice in
>> the world to manage large Hadoop clusters.
>>
>> Please suggest whatever is for engineering design or business
>>policy/rules.
>>
>> Thanks
>> Edward
>>
>>

Re: [Discuss] Hadoop metrics,job,GC monitoring

Posted by Arun Manoharan <ar...@apache.org>.

Thanks Edward for starting the thread. I think it is important to have the
job monitoring (MR/Spark) workloads for performance of the cluster and
availability.

But it will be beneficial to have an extensible framework where users can
create business rules like "I want an alert when NN is in safemode or RM is
flipping etc".

Thanks,
Arun

On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
yonzhang@ebay.com> wrote:

> Hi Eagle devs/users,
>
> As proposed in apache eagle incubator proposal, Eagle will start
> design/dev to support Hadoop system monitoring besides security monitoring
> which includes Hadoop native metrics, job, gclog etc.
>
> The community is also interested in Hadoop system monitoring by Eagle when
> we recently talked about Eagle product in public conferences, meet up etc.
>
> Take Hadoop native metrics as an example, first of all those metrics are
> pretty valuable in determining system health status, secondly collecting
> huge amount metrics, visualizing, and alerting is very challenging.  We
> need think of declarative collection, dynamic aggregation, metric storage,
> metric query engine etc.
>
> Besides technical design, comprehensive policy/rule are also valuable to
> be shared in the community. Those policy/rule represent best practice in
> the world to manage large Hadoop clusters.
>
> Please suggest whatever is for engineering design or business policy/rules.
>
> Thanks
> Edward
>
>

Re: [Discuss] Hadoop metrics,job,GC monitoring

Posted by Arun Manoharan <ar...@apache.org>.

Thanks Edward for starting the thread. I think it is important to have the
job monitoring (MR/Spark) workloads for performance of the cluster and
availability.

But it will be beneficial to have an extensible framework where users can
create business rules like "I want an alert when NN is in safemode or RM is
flipping etc".

Thanks,
Arun

On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
yonzhang@ebay.com> wrote:

> Hi Eagle devs/users,
>
> As proposed in apache eagle incubator proposal, Eagle will start
> design/dev to support Hadoop system monitoring besides security monitoring
> which includes Hadoop native metrics, job, gclog etc.
>
> The community is also interested in Hadoop system monitoring by Eagle when
> we recently talked about Eagle product in public conferences, meet up etc.
>
> Take Hadoop native metrics as an example, first of all those metrics are
> pretty valuable in determining system health status, secondly collecting
> huge amount metrics, visualizing, and alerting is very challenging.  We
> need think of declarative collection, dynamic aggregation, metric storage,
> metric query engine etc.
>
> Besides technical design, comprehensive policy/rule are also valuable to
> be shared in the community. Those policy/rule represent best practice in
> the world to manage large Hadoop clusters.
>
> Please suggest whatever is for engineering design or business policy/rules.
>
> Thanks
> Edward
>
>