You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@eagle.apache.org by Edward Zhang <yo...@gmail.com> on 2018/12/24 05:53:13 UTC

Re: Architecture improvement discussion

Qingwen,

There is no new architecture yet, it is just a very initial discussion :-)

HBase is mainly used for job performance monitoring where mapreduce
job/task data are stored. As long as Eagle supports customized job
performance monitoring, the storage has be there although our data model is
pretty agnostic to storage implementation.

For metrics, Eagle 0.5 actually does not store metrics and only process
them in streaming mode. My suggestion is we use mature tools like
Prometheus to store and visualize metrics while Eagle focuses on policy
evaluation.

Thanks
Edward

On Thu, Oct 25, 2018 at 3:00 AM Zhao Qingwen <qi...@gmail.com> wrote:

> Hi Edward,
>
> In the new architecture, is the storage(hbase) taken off?
> How do the adaptors store the data? for example, hadoop namenode metrics.
>
> Best Regards,
> Qingwen Zhao | 赵晴雯
>
>
>
>
>
> Edward Zhang <yo...@apache.org> 于2018年10月12日周五 上午10:49写道：
>
> > Hi Eaglers,
> >
> > I would like to start some discussion about architecture improvement for
> > Apache Eagle based on community experience and feedback. The improvement
> is
> > targeted for simplifying installation and development of Apache Eagle.
> >
> > Eagle's main responsibility is to report abnormality instantly by
> applying
> > policies on streaming data. Eagle consists of two major components,
> Policy
> > Engine and Adaptors. Policy Engine is a standalone application which
> > provides REST API to manage policy lifecycle for different data sources
> and
> > provides runtime to evaluate policy on streaming data.  Adaptors are
> those
> > applications which fetch/process log/metrics from outside and send data
> to
> > policy engine for alerting purpose.
> >
> > But right now in Eagle code base, it is not clearly focusing on the two
> > components. For example the current source code includes map/reduce
> > job/task log retrieval/cleanup/analysis which is very useful but probably
> > Eagle only needs the portion of data retrieval/cleanup part and so data
> can
> > be streamed into policy engine for alerting purpose. For job/task
> analysis
> > part, it can be maintained in other project.
> >
> > First let me list the main modules Eagle source code consists of.
> > - eagle core
> >     - policy engine (coordinator, runtime, and web)
> >     - monitor application management
> >     - eagle query framework - for querying time series data from hbase
> > - eagle adaptors
> >      - gc log fetch/processing and alerting
> >      - metric fetch/processing and alerting, including name node, data
> > node, hbase etc.
> >      - jpm: job performance management.
> >             - haoop yarn queue statistics fetch/processing
> >             - hadoop mapreduce history job log processing
> >             - hadoop mapreduce running job processing
> >             - spark history job log processing
> >             - spark running job processing
> >             - jpm web application
> >             - hadoop job analyzer
> >        - security monitoring
> >              - hdfs audit log fetch/processing
> >              - hdfs auth log fetch/processing
> >              - hbase audit log fetch/processing
> >              - hive log fetch/processing
> >              - maprfs audit log fetch/processing
> >              - oozie audit log fetch/processing
> >         - hadoop topology stats fetch/processing
> > - eagle server
> >
> > It is very obvious that it is not scale for Eagle community to maintain
> so
> > large amount of monitoring adaptors especially when Hadoop/Spark versions
> > are evolving pretty fast.
> >
> > My suggestion is Eagle ONLY focus on policy engine and some default
> > important adaptors but remove/separate some unrelated functionalities.
> For
> > policy engine, it would be nice if it can run on popular streaming engine
> > besides Apache Storm so that it can be easily deployed for community
> users.
> > For default important adaptors, I may suggest Eagle have ONLY HDFS audit
> > log, Hadoop running job, Spark running job, HDFS namenode metrics etc.
> For
> > unrelated functionalities, we can either remove them from Eagle code base
> > or separate them into standalone executable if that is still really
> needed
> > under Apache Eagle monitoring umbrella by community.
> >
> > So the proposed Eagle code base would be like:
> > - policy engine
> >      - coordinator
> >      - runtime
> >      - web
> > - adaptors
> >     - hdfs audit log
> >     - Hadoop running job
> >     - Spark running job
> >     - HDFS namenode metrics
> >     - Hadoop yarn queue metrics
> > - extensions (some non default adaptors contributed by community)
> > - executables (standalone executables which are legacy)
> >
> > It would be great if you can provide more feedback on this discussion.
> >
> > (By the way, I also had a lot of discussion with Hao, Chen, Eagle PMC
> > member and core developer about this topic based on his experience of
> > engaging Eagle users.)
> >
> > Thanks
> > Edward
> >
>

Re: Architecture improvement discussion

Posted by Sidharth Kumar <si...@gmail.com>.

Hi Edward,

In my current project we are using Prometheus for system/service monitoring
for 200+ servers and we have issues when it comes to scalability.

What about Apache drill?

Warm Regards

Sidharth Kumar | Mob: +91 8197 555 599

On Mon, 18 Feb, 2019, 3:18 PM Zhao, Qingwen <qingwzhao@ebay.com.invalid
wrote:

> Got it. I agree with your idea.
> I have used Prometheus for a while in another project, and it's very easy
> to use and maintain it.
>
> Thanks,
> Qingwen
>
> On 2018/12/24, 1:54 PM, "Edward Zhang" <yo...@gmail.com> wrote:
>
>     Qingwen,
>
>     There is no new architecture yet, it is just a very initial discussion
> :-)
>
>     HBase is mainly used for job performance monitoring where mapreduce
>     job/task data are stored. As long as Eagle supports customized job
>     performance monitoring, the storage has be there although our data
> model is
>     pretty agnostic to storage implementation.
>
>     For metrics, Eagle 0.5 actually does not store metrics and only process
>     them in streaming mode. My suggestion is we use mature tools like
>     Prometheus to store and visualize metrics while Eagle focuses on policy
>     evaluation.
>
>     Thanks
>     Edward
>
>     On Thu, Oct 25, 2018 at 3:00 AM Zhao Qingwen <qi...@gmail.com>
> wrote:
>
>     > Hi Edward,
>     >
>     > In the new architecture, is the storage(hbase) taken off?
>     > How do the adaptors store the data? for example, hadoop namenode
> metrics.
>     >
>     > Best Regards,
>     > Qingwen Zhao | 赵晴雯
>     >
>     >
>     >
>     >
>     >
>     > Edward Zhang <yo...@apache.org> 于2018年10月12日周五 上午10:49写道：
>     >
>     > > Hi Eaglers,
>     > >
>     > > I would like to start some discussion about architecture
> improvement for
>     > > Apache Eagle based on community experience and feedback. The
> improvement
>     > is
>     > > targeted for simplifying installation and development of Apache
> Eagle.
>     > >
>     > > Eagle's main responsibility is to report abnormality instantly by
>     > applying
>     > > policies on streaming data. Eagle consists of two major components,
>     > Policy
>     > > Engine and Adaptors. Policy Engine is a standalone application
> which
>     > > provides REST API to manage policy lifecycle for different data
> sources
>     > and
>     > > provides runtime to evaluate policy on streaming data.  Adaptors
> are
>     > those
>     > > applications which fetch/process log/metrics from outside and send
> data
>     > to
>     > > policy engine for alerting purpose.
>     > >
>     > > But right now in Eagle code base, it is not clearly focusing on
> the two
>     > > components. For example the current source code includes map/reduce
>     > > job/task log retrieval/cleanup/analysis which is very useful but
> probably
>     > > Eagle only needs the portion of data retrieval/cleanup part and so
> data
>     > can
>     > > be streamed into policy engine for alerting purpose. For job/task
>     > analysis
>     > > part, it can be maintained in other project.
>     > >
>     > > First let me list the main modules Eagle source code consists of.
>     > > - eagle core
>     > >     - policy engine (coordinator, runtime, and web)
>     > >     - monitor application management
>     > >     - eagle query framework - for querying time series data from
> hbase
>     > > - eagle adaptors
>     > >      - gc log fetch/processing and alerting
>     > >      - metric fetch/processing and alerting, including name node,
> data
>     > > node, hbase etc.
>     > >      - jpm: job performance management.
>     > >             - haoop yarn queue statistics fetch/processing
>     > >             - hadoop mapreduce history job log processing
>     > >             - hadoop mapreduce running job processing
>     > >             - spark history job log processing
>     > >             - spark running job processing
>     > >             - jpm web application
>     > >             - hadoop job analyzer
>     > >        - security monitoring
>     > >              - hdfs audit log fetch/processing
>     > >              - hdfs auth log fetch/processing
>     > >              - hbase audit log fetch/processing
>     > >              - hive log fetch/processing
>     > >              - maprfs audit log fetch/processing
>     > >              - oozie audit log fetch/processing
>     > >         - hadoop topology stats fetch/processing
>     > > - eagle server
>     > >
>     > > It is very obvious that it is not scale for Eagle community to
> maintain
>     > so
>     > > large amount of monitoring adaptors especially when Hadoop/Spark
> versions
>     > > are evolving pretty fast.
>     > >
>     > > My suggestion is Eagle ONLY focus on policy engine and some default
>     > > important adaptors but remove/separate some unrelated
> functionalities.
>     > For
>     > > policy engine, it would be nice if it can run on popular streaming
> engine
>     > > besides Apache Storm so that it can be easily deployed for
> community
>     > users.
>     > > For default important adaptors, I may suggest Eagle have ONLY HDFS
> audit
>     > > log, Hadoop running job, Spark running job, HDFS namenode metrics
> etc.
>     > For
>     > > unrelated functionalities, we can either remove them from Eagle
> code base
>     > > or separate them into standalone executable if that is still really
>     > needed
>     > > under Apache Eagle monitoring umbrella by community.
>     > >
>     > > So the proposed Eagle code base would be like:
>     > > - policy engine
>     > >      - coordinator
>     > >      - runtime
>     > >      - web
>     > > - adaptors
>     > >     - hdfs audit log
>     > >     - Hadoop running job
>     > >     - Spark running job
>     > >     - HDFS namenode metrics
>     > >     - Hadoop yarn queue metrics
>     > > - extensions (some non default adaptors contributed by community)
>     > > - executables (standalone executables which are legacy)
>     > >
>     > > It would be great if you can provide more feedback on this
> discussion.
>     > >
>     > > (By the way, I also had a lot of discussion with Hao, Chen, Eagle
> PMC
>     > > member and core developer about this topic based on his experience
> of
>     > > engaging Eagle users.)
>     > >
>     > > Thanks
>     > > Edward
>     > >
>     >
>
>
>

Re: Architecture improvement discussion

Posted by "Zhao, Qingwen" <qi...@ebay.com.INVALID>.

Got it. I agree with your idea. 
I have used Prometheus for a while in another project, and it's very easy to use and maintain it. 

Thanks,
Qingwen

On 2018/12/24, 1:54 PM, "Edward Zhang" <yo...@gmail.com> wrote:

    Qingwen,
    
    There is no new architecture yet, it is just a very initial discussion :-)
    
    HBase is mainly used for job performance monitoring where mapreduce
    job/task data are stored. As long as Eagle supports customized job
    performance monitoring, the storage has be there although our data model is
    pretty agnostic to storage implementation.
    
    For metrics, Eagle 0.5 actually does not store metrics and only process
    them in streaming mode. My suggestion is we use mature tools like
    Prometheus to store and visualize metrics while Eagle focuses on policy
    evaluation.
    
    Thanks
    Edward
    
    On Thu, Oct 25, 2018 at 3:00 AM Zhao Qingwen <qi...@gmail.com> wrote:
    
    > Hi Edward,
    >
    > In the new architecture, is the storage(hbase) taken off?
    > How do the adaptors store the data? for example, hadoop namenode metrics.
    >
    > Best Regards,
    > Qingwen Zhao | 赵晴雯
    >
    >
    >
    >
    >
    > Edward Zhang <yo...@apache.org> 于2018年10月12日周五 上午10:49写道：
    >
    > > Hi Eaglers,
    > >
    > > I would like to start some discussion about architecture improvement for
    > > Apache Eagle based on community experience and feedback. The improvement
    > is
    > > targeted for simplifying installation and development of Apache Eagle.
    > >
    > > Eagle's main responsibility is to report abnormality instantly by
    > applying
    > > policies on streaming data. Eagle consists of two major components,
    > Policy
    > > Engine and Adaptors. Policy Engine is a standalone application which
    > > provides REST API to manage policy lifecycle for different data sources
    > and
    > > provides runtime to evaluate policy on streaming data.  Adaptors are
    > those
    > > applications which fetch/process log/metrics from outside and send data
    > to
    > > policy engine for alerting purpose.
    > >
    > > But right now in Eagle code base, it is not clearly focusing on the two
    > > components. For example the current source code includes map/reduce
    > > job/task log retrieval/cleanup/analysis which is very useful but probably
    > > Eagle only needs the portion of data retrieval/cleanup part and so data
    > can
    > > be streamed into policy engine for alerting purpose. For job/task
    > analysis
    > > part, it can be maintained in other project.
    > >
    > > First let me list the main modules Eagle source code consists of.
    > > - eagle core
    > >     - policy engine (coordinator, runtime, and web)
    > >     - monitor application management
    > >     - eagle query framework - for querying time series data from hbase
    > > - eagle adaptors
    > >      - gc log fetch/processing and alerting
    > >      - metric fetch/processing and alerting, including name node, data
    > > node, hbase etc.
    > >      - jpm: job performance management.
    > >             - haoop yarn queue statistics fetch/processing
    > >             - hadoop mapreduce history job log processing
    > >             - hadoop mapreduce running job processing
    > >             - spark history job log processing
    > >             - spark running job processing
    > >             - jpm web application
    > >             - hadoop job analyzer
    > >        - security monitoring
    > >              - hdfs audit log fetch/processing
    > >              - hdfs auth log fetch/processing
    > >              - hbase audit log fetch/processing
    > >              - hive log fetch/processing
    > >              - maprfs audit log fetch/processing
    > >              - oozie audit log fetch/processing
    > >         - hadoop topology stats fetch/processing
    > > - eagle server
    > >
    > > It is very obvious that it is not scale for Eagle community to maintain
    > so
    > > large amount of monitoring adaptors especially when Hadoop/Spark versions
    > > are evolving pretty fast.
    > >
    > > My suggestion is Eagle ONLY focus on policy engine and some default
    > > important adaptors but remove/separate some unrelated functionalities.
    > For
    > > policy engine, it would be nice if it can run on popular streaming engine
    > > besides Apache Storm so that it can be easily deployed for community
    > users.
    > > For default important adaptors, I may suggest Eagle have ONLY HDFS audit
    > > log, Hadoop running job, Spark running job, HDFS namenode metrics etc.
    > For
    > > unrelated functionalities, we can either remove them from Eagle code base
    > > or separate them into standalone executable if that is still really
    > needed
    > > under Apache Eagle monitoring umbrella by community.
    > >
    > > So the proposed Eagle code base would be like:
    > > - policy engine
    > >      - coordinator
    > >      - runtime
    > >      - web
    > > - adaptors
    > >     - hdfs audit log
    > >     - Hadoop running job
    > >     - Spark running job
    > >     - HDFS namenode metrics
    > >     - Hadoop yarn queue metrics
    > > - extensions (some non default adaptors contributed by community)
    > > - executables (standalone executables which are legacy)
    > >
    > > It would be great if you can provide more feedback on this discussion.
    > >
    > > (By the way, I also had a lot of discussion with Hao, Chen, Eagle PMC
    > > member and core developer about this topic based on his experience of
    > > engaging Eagle users.)
    > >
    > > Thanks
    > > Edward
    > >
    >