You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by "Manoharan, Arun" <ar...@ebay.com> on 2015/10/26 18:59:57 UTC

[Result] Accept Eagle into Apache Incubation

Hello Everyone,

Thanks for participating in the vote and discussions about Eagle.

Binding votes = 10
Non-binding votes = 14
Total votes = 24

Thanks,

Arun 


On 10/25/15, 8:37 PM, "Don Bosco Durai" <bo...@apache.org> wrote:

>+1 non binding 
>Bosco 
>
>
>
>    _____________________________
>From: Li Yang <li...@apache.org>
>Sent: Sunday, October 25, 2015 8:13 PM
>Subject: Re: [VOTE] Accept Eagle into Apache Incubation
>To:  <ge...@incubator.apache.org>
>
>
>+1 (non-binding)
>
>On Mon, Oct 26, 2015 at 10:50 AM, hongbin ma <ma...@apache.org> wrote:
>
>> +1 (non binding)
>>
>> On Mon, Oct 26, 2015 at 12:20 AM, Ralph Goers
>><ra...@dslextreme.com>
>> wrote:
>>
>> > +1 (binding)
>> >
>> > Ralph
>> >
>> > > On Oct 23, 2015, at 7:11 AM, Manoharan, Arun <ar...@ebay.com>
>> > wrote:
>> > >
>> > > Hello Everyone,
>> > >
>> > > Thanks for all the feedback on the Eagle Proposal.
>> > >
>> > > I would like to call for a [VOTE] on Eagle joining the ASF as an
>> > incubation project.
>> > >
>> > > The vote is open for 72 hours:
>> > >
>> > > [ ] +1 accept Eagle in the Incubator
>> > > [ ] ±0
>> > > [ ] -1 (please give reason)
>> > >
>> > > Eagle is a Monitoring solution for Hadoop to instantly identify
>>access
>> > to sensitive data, recognize attacks, malicious activities and take
>> actions
>> > in real time. Eagle supports a wide variety of policies on HDFS data
>>and
>> > Hive. Eagle also provides machine learning models for detecting
>>anomalous
>> > user behavior in Hadoop.
>> > >
>> > > The proposal is available on the wiki here:
>> > > https://wiki.apache.org/incubator/EagleProposal
>> > >
>> > > The text of the proposal is also available at the end of this email.
>> > >
>> > > Thanks for your time and help.
>> > >
>> > > Thanks,
>> > > Arun
>> > >
>> > > <COPY of the proposal in text format>
>> > >
>> > > Eagle
>> > >
>> > > Abstract
>> > > Eagle is an Open Source Monitoring solution for Hadoop to instantly
>> > identify access to sensitive data, recognize attacks, malicious
>> activities
>> > in hadoop and take actions.
>> > >
>> > > Proposal
>> > > Eagle audits access to HDFS files, Hive and HBase tables in real
>>time,
>> > enforces policies defined on sensitive data access and alerts or
>>blocks
>> > user’s access to that sensitive data in real time. Eagle also creates
>> user
>> > profiles based on the typical access behaviour for HDFS and Hive and
>> sends
>> > alerts when anomalous behaviour is detected. Eagle can also import
>> > sensitive data information classified by external classification
>>engines
>> to
>> > help define its policies.
>> > >
>> > > Overview of Eagle
>> > > Eagle has 3 main parts.
>> > > 1.Data collection and storage - Eagle collects data from various
>>hadoop
>> > logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>> storage.
>> > > 2.Data processing and policy engine - Eagle allows users to create
>> > policies based on various metadata properties on HDFS, Hive and HBase
>> data.
>> > > 3.Eagle services - Eagle services include policy manager, query
>>service
>> > and the visualization component. Eagle provides intuitive user
>>interface
>> to
>> > administer Eagle and an alert dashboard to respond to real time
>>alerts.
>> > >
>> > > Data Collection and Storage:
>> > > Eagle provides programming API for extending Eagle to integrate any
>> data
>> > source into Eagle policy evaluation framework. For example, Eagle hdfs
>> > audit monitoring collects data from Kafka which is populated from
>> namenode
>> > log4j appender or from logstash agent. Eagle hive monitoring collects
>> hive
>> > query logs from running job through YARN API, which is designed to be
>> > scalable and fault-tolerant. Eagle uses HBase as storage for storing
>> > metadata and metrics data, and also supports relational database
>>through
>> > configuration change.
>> > >
>> > > Data Processing and Policy Engine:
>> > > Processing Engine: Eagle provides stream processing API which is an
>> > abstraction of Apache Storm. It can also be extended to other
>>streaming
>> > engines. This abstraction allows developers to assemble data
>> > transformation, filtering, external data join etc. without physically
>> bound
>> > to a specific streaming platform. Eagle streaming API allows
>>developers
>> to
>> > easily integrate business logic with Eagle policy engine and
>>internally
>> > Eagle framework compiles business logic execution DAG into program
>> > primitives of underlying stream infrastructure e.g. Apache Storm. For
>> > example, Eagle HDFS monitoring transforms audit log from Namenode to
>> object
>> > and joins sensitivity metadata, security zone metadata which are
>> generated
>> > from external programs or configured by user. Eagle hive monitoring
>> filters
>> > running jobs to get hive query string and parses query string into
>>object
>> > and then joins sensitivity metadata.
>> > > Alerting Framework: Eagle Alert Framework includes stream metadata
>>API,
>> > scalable policy engine framework, extensible policy engine framework.
>> > Stream metadata API allows developers to declare event schema
>>including
>> > what attributes constitute an event, what is the type for each
>>attribute,
>> > and how to dynamically resolve attribute value in runtime when user
>> > configures policy. Scalable policy engine framework allows policies
>>to be
>> > executed on different physical nodes in parallel. It is also used to
>> define
>> > your own policy partitioner class. Policy engine framework together
>>with
>> > streaming partitioning capability provided by all streaming platforms
>> will
>> > make sure policies and events can be evaluated in a fully distributed
>> way.
>> > Extensible policy engine framework allows developer to plugin a new
>> policy
>> > engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
>> > engine which Eagle supports as first-class citizen.
>> > > Machine Learning module: Eagle provides capabilities to define user
>> > activity patterns or user profiles for Hadoop users based on the user
>> > behaviour in the platform. These user profiles are modeled using
>>Machine
>> > Learning algorithms and used for detection of anomalous users
>>activities.
>> > Eagle uses Eigen Value Decomposition, and Density Estimation
>>algorithms
>> for
>> > generating user profile models. The model reads data from HDFS audit
>> logs,
>> > preprocesses and aggregates data, and generates models using Spark
>> > programming APIs. Once models are generated, Eagle uses stream
>>processing
>> > engine for near real-time anomaly detection to determine if any user’s
>> > activities are suspicious or not.
>> > >
>> > > Eagle Services:
>> > > Query Service: Eagle provides SQL-like service API to support
>> > comprehensive computation for huge set of data on the fly, for e.g.
>> > comprehensive filtering, aggregation, histogram, sorting, top,
>> arithmetical
>> > expression, pagination etc. HBase is the data storage which Eagle
>> supports
>> > as first-class citizen, relational database is supported as well. For
>> HBase
>> > storage, Eagle query framework compiles user provided SQL-like query
>>into
>> > HBase native filter objects and execute it through HBase coprocessor
>>on
>> the
>> > fly.
>> > > Policy Manager: Eagle policy manager provides UI and Restful API for
>> > user to define policy with just a few clicks. It includes site
>>management
>> > UI, policy editor, sensitivity metadata import, HDFS or Hive sensitive
>> > resource browsing, alert dashboards etc.
>> > > Background
>> > > Data is one of the most important assets for today’s businesses,
>>which
>> > makes data security one of the top priorities of today’s enterprises.
>> > Hadoop is widely used across different verticals as a big data
>>repository
>> > to store this data in most modern enterprises.
>> > > At eBay we use hadoop platform extensively for our data processing
>> > needs. Our data in Hadoop is becoming bigger and bigger as our user
>>base
>> is
>> > seeing an exponential growth. Today there are variety of data sets
>> > available in Hadoop cluster for our users to consume. eBay has around
>>120
>> > PB of data stored in HDFS across 6 different clusters and around 1800+
>> > active hadoop users consuming data thru Hive, HBase and mapreduce jobs
>> > everyday to build applications using this data. With this astronomical
>> > growth of data there are also challenges in securing sensitive data
>>and
>> > monitoring the access to this sensitive data. Today in large
>> organizations
>> > HDFS is the defacto standard for storing big data. Data sets which
>> includes
>> > and not limited to consumer sentiment, social media data, customer
>> > segmentation, web clicks, sensor data, geo-location and transaction
>>data
>> > get stored in Hadoop for day to day business needs.
>> > > We at eBay want to make sure the sensitive data and data platforms
>>are
>> > completely protected from security breaches. So we partnered very
>>closely
>> > with our Information Security team to understand the requirements for
>> Eagle
>> > to monitor sensitive data access on hadoop:
>> > > 1.Ability to identify and stop security threats in real time
>> > > 2.Scale for big data (Support PB scale and Billions of events)
>> > > 3.Ability to create data access policies
>> > > 4.Support multiple data sources like HDFS, HBase, Hive
>> > > 5.Visualize alerts in real time
>> > > 6.Ability to block malicious access in real time
>> > > We did not find any data access monitoring solution that available
>> today
>> > and can provide the features and functionality that we need to monitor
>> the
>> > data access in the hadoop ecosystem at our scale. Hence with an
>>excellent
>> > team of world class developers and several users, we have been able to
>> > bring Eagle into production as well as open source it.
>> > >
>> > > Rationale
>> > > In today’s world; data is an important asset for any company.
>> Businesses
>> > are using data extensively to create amazing experiences for users.
>>Data
>> > has to be protected and access to data should be secured from security
>> > breaches. Today Hadoop is not only used to store logs but also stores
>> > financial data, sensitive data sets, geographical data, user click
>>stream
>> > data sets etc. which makes it more important to be protected from
>> security
>> > breaches. To secure a data platform there are multiple things that
>>need
>> to
>> > happen. One is having a strong access control mechanism which today is
>> > provided by Apache Ranger and Apache Sentry. These tools provide the
>> > ability to provide fine grain access control mechanism to data sets on
>> > hadoop. But there is a big gap in terms of monitoring all the data
>>access
>> > events and activities in order to securing the hadoop data platform.
>> > Together with strong access control, perimeter security and data
>>access
>> > monitoring in place data in the hadoop clusters can be secured against
>> > breaches. We looked around and found following:
>> > > Existing data activity monitoring products are designed for
>>traditional
>> > databases and data warehouse. Existing monitoring platforms cannot
>>scale
>> > out to support fast growing data and petabyte scale. Few products in
>>the
>> > industry are still very early in terms of supporting HDFS, Hive, HBase
>> data
>> > access monitoring.
>> > > As mentioned in the background, the business requirement and
>>urgency to
>> > secure the data from users with malicious intent drove eBay to invest
>>in
>> > building a real time data access monitoring solution from scratch to
>> offer
>> > real time alerts and remediation features for malicious data access.
>> > > With the power of open source distributed systems like Hadoop, Kafka
>> and
>> > much more we were able to develop a data activity monitoring system
>>that
>> > can scale, identify and stop malicious access in real time.
>> > > Eagle allows admins to create standard access policies and rules for
>> > monitoring HDFS, Hive and HBase data. Eagle also provides out of box
>> > machine learning models for modeling user profiles based on user
>>access
>> > behaviour and use the model to alert on anomalies.
>> > >
>> > > Current Status
>> > >
>> > > Meritocracy
>> > > Eagle has been deployed in production at eBay for monitoring
>>billions
>> of
>> > events per day from HDFS and Hive operations. From the start; the
>>product
>> > has been built with focus on high scalability and application
>> extensibility
>> > in mind and Eagle has demonstrated great performance in responding to
>> > suspicious events instantly and great flexibility in defining policy.
>> > >
>> > > Community
>> > > Eagle seeks to develop the developer and user communities during
>> > incubation.
>> > >
>> > > Core Developers
>> > > Eagle is currently being designed and developed by engineers from
>>eBay
>> > Inc. – Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang,
>> > Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of
>> > these core developers have deep expertise in developing monitoring
>> products
>> > for the Hadoop ecosystem.
>> > >
>> > > Alignment
>> > > The ASF is a natural host for Eagle given that it is already the
>>home
>> of
>> > Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
>> > projects. Eagle leverages lot of Apache open-source products. Eagle
>>was
>> > designed to offer real time insights into sensitive data access by
>> actively
>> > monitoring the data access on various data sets in hadoop and an
>> extensible
>> > alerting framework with a powerful policy engine. Eagle compliments
>>the
>> > existing Hadoop platform area by providing a comprehensive monitoring
>>and
>> > alerting solution for detecting sensitive data access threats based on
>> > preset policies and machine learning models for user behaviour
>>analysis.
>> > >
>> > > Known Risks
>> > >
>> > > Orphaned Products
>> > > The core developers of Eagle team work full time on this project.
>>There
>> > is no risk of Eagle getting orphaned since eBay is extensively using
>>it
>> in
>> > their production Hadoop clusters and have plans to go beyond hadoop.
>>For
>> > example, currently there are 7 hadoop clusters and 2 of them are being
>> > monitored using Hadoop Eagle in production. We have plans to extend
>>it to
>> > all hadoop clusters and eventually other data platforms. There are
>>10’s
>> of
>> > policies onboarded and actively monitored with plans to onboard more
>>use
>> > case. We are very confident that every hadoop cluster in the world
>>will
>> be
>> > monitored using Eagle for securing the hadoop ecosystem by actively
>> > monitoring for data access on sensitive data. We plan to extend and
>> > diversify this community further through Apache. We presented Eagle at
>> the
>> > hadoop summit in china and garnered interest from different companies
>>who
>> > use hadoop extensively.
>> > >
>> > > Inexperience with Open Source
>> > > The core developers are all active users and followers of open
>>source.
>> > They are already committers and contributors to the Eagle Github
>>project.
>> > All have been involved with the source code that has been released
>>under
>> an
>> > open source license, and several of them also have experience
>>developing
>> > code in an open source environment. Though the core set of Developers
>>do
>> > not have Apache Open Source experience, there are plans to onboard
>> > individuals with Apache open source experience on to the project.
>>Apache
>> > Kylin PMC members are also in the same ebay organization. We work very
>> > closely with Apache Ranger committers and are looking forward to find
>> > meaningful integrations to improve the security of hadoop platform.
>> > >
>> > > Homogenous Developers
>> > > The core developers are from eBay. Today the problem of monitoring
>>data
>> > activities to find and stop threats is a universal problem faced by
>>all
>> the
>> > businesses. Apache Incubation process encourages an open and diverse
>> > meritocratic community. Eagle intends to make every possible effort to
>> > build a diverse, vibrant and involved community and has already
>>received
>> > substantial interest from various organizations.
>> > >
>> > > Reliance on Salaried Developers
>> > > eBay invested in Eagle as the monitoring solution for Hadoop
>>clusters
>> > and some of its key engineers are working full time on the project. In
>> > addition, since there is a growing need for securing sensitive data
>> access
>> > we need a data activity monitoring solution for Hadoop, we look
>>forward
>> to
>> > other Apache developers and researchers to contribute to the project.
>> > Additional contributors, including Apache committers have plans to
>>join
>> > this effort shortly. Also key to addressing the risk associated with
>> > relying on Salaried developers from a single entity is to increase the
>> > diversity of the contributors and actively lobby for Domain experts in
>> the
>> > security space to contribute. Eagle intends to do this.
>> > >
>> > > Relationships with Other Apache Products
>> > > Eagle has a strong relationship and dependency with Apache Hadoop,
>> > HBase, Spark, Kafka and Storm. Being part of Apache’s Incubation
>> community,
>> > could help with a closer collaboration among these projects and as
>>well
>> as
>> > others. An Excessive Fascination with the Apache Brand Eagle is
>>proposing
>> > to enter incubation at Apache in order to help efforts to diversify
>>the
>> > committer-base, not so much to capitalize on the Apache brand. The
>>Eagle
>> > project is in production use already inside eBay, but is not expected
>>to
>> be
>> > an eBay product for external customers. As such, the Eagle project is
>>not
>> > seeking to use the Apache brand as a marketing tool.
>> > >
>> > > Documentation
>> > > Information about Eagle can be found at
>>https://github.com/eBay/Eagle.
>> > The following link provide more information about Eagle
>> http://goeagle.io<
>> > http://goeagle.io/>.
>> > >
>> > > Initial Source
>> > > Eagle has been under development since 2014 by a team of engineers
>>at
>> > eBay Inc. It is currently hosted on Github.com under an Apache license
>> 2.0
>> > at https://github.com/eBay/Eagle. Once in incubation we will be moving
>> > the code base to apache git library.
>> > >
>> > > External Dependencies
>> > > Eagle has the following external dependencies.
>> > > Basic
>> > > •JDK 1.7+
>> > > •Scala 2.10.4
>> > > •Apache Maven
>> > > •JUnit
>> > > •Log4j
>> > > •Slf4j
>> > > •Apache Commons
>> > > •Apache Commons Math3
>> > > •Jackson
>> > > •Siddhi CEP engine
>> > >
>> > > Hadoop
>> > > •Apache Hadoop
>> > > •Apache HBase
>> > > •Apache Hive
>> > > •Apache Zookeeper
>> > > •Apache Curator
>> > >
>> > > Apache Spark
>> > > •Spark Core Library
>> > >
>> > > REST Service
>> > > •Jersey
>> > >
>> > > Query
>> > > •Antlr
>> > >
>> > > Stream processing
>> > > •Apache Storm
>> > > •Apache Kafka
>> > >
>> > > Web
>> > > •AngularJS
>> > > •jQuery
>> > > •Bootstrap V3
>> > > •Moment JS
>> > > •Admin LTE
>> > > •html5shiv
>> > > •respond
>> > > •Fastclick
>> > > •Date Range Picker
>> > > •Flot JS
>> > >
>> > > Cryptography
>> > > Eagle will eventually support encryption on the wire. This is not
>>one
>> of
>> > the initial goals, and we do not expect Eagle to be a controlled
>>export
>> > item due to the use of encryption. Eagle supports but does not require
>> the
>> > Kerberos authentication mechanism to access secured Hadoop services.
>> > >
>> > > Required Resources
>> > >
>> > > Mailing List
>> > > •eagle-private for private PMC discussions
>> > > •eagle-dev for developers
>> > > •eagle-commits for all commits
>> > > •eagle-users for all eagle users
>> > >
>> > > Subversion Directory
>> > > •Git is the preferred source control system.
>> > >
>> > > Issue Tracking
>> > > •JIRA Eagle (Eagle)
>> > >
>> > > Other Resources
>> > > The existing code already has unit tests so we will make use of
>> existing
>> > Apache continuous testing infrastructure. The resulting load should
>>not
>> be
>> > very large.
>> > >
>> > > Initial Committers
>> > > •Seshu Adunuthula <sadunuthula at ebay dot com>
>> > > •Arun Manoharan <armanoharan at ebay dot com>
>> > > •Edward Zhang <yonzhang at ebay dot com>
>> > > •Hao Chen <hchen9 at ebay dot com>
>> > > •Chaitali Gupta <cgupta at ebay dot com>
>> > > •Libin Sun <libsun at ebay dot com>
>> > > •Jilin Jiang <jiljiang at ebay dot com>
>> > > •Qingwen Zhao <qingwzhao at ebay dot com>
>> > > •Hemanth Dendukuri <hdendukuri at ebay dot com>
>> > > •Senthil Kumar <senthilkumar at ebay dot com>
>> > >
>> > >
>> > > Affiliations
>> > > The initial committers are employees of eBay Inc.
>> > >
>> > > Sponsors
>> > >
>> > > Champion
>> > > •Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>> > >
>> > > Nominated Mentors
>> > > •Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
>> > Hortonworks
>> > > •Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>> > > •Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
>> > Hortonworks
>> > > •Amareshwari Sriramdasu <amareshwari at apache dot org> - Apache
>>IPMC
>> > member
>> > > •Taylor Goetz <ptgoetz at apache dot org> - Apache IPMC member,
>> > Hortonworks
>> > >
>> > > Sponsoring Entity
>> > > We are requesting the Incubator to sponsor this project.
>> > >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>> >
>>
>>
>> --
>> Regards,
>>
>> *Bin Mahone | 马洪宾*
>> Apache Kylin: http://kylin.io
>> Github: https://github.com/binmahone


Re: [Result] Accept Eagle into Apache Incubation

Posted by Ted Dunning <te...@gmail.com>.
As a clarification, all votes were +1



On Mon, Oct 26, 2015 at 10:59 AM, Manoharan, Arun <ar...@ebay.com>
wrote:

> Hello Everyone,
>
> Thanks for participating in the vote and discussions about Eagle.
>
> Binding votes = 10
> Non-binding votes = 14
> Total votes = 24
>
> Thanks,
>
> Arun
>
>
> On 10/25/15, 8:37 PM, "Don Bosco Durai" <bo...@apache.org> wrote:
>
> >+1 non binding
> >Bosco
> >
> >
> >
> >    _____________________________
> >From: Li Yang <li...@apache.org>
> >Sent: Sunday, October 25, 2015 8:13 PM
> >Subject: Re: [VOTE] Accept Eagle into Apache Incubation
> >To:  <ge...@incubator.apache.org>
> >
> >
> >+1 (non-binding)
> >
> >On Mon, Oct 26, 2015 at 10:50 AM, hongbin ma <ma...@apache.org>
> wrote:
> >
> >> +1 (non binding)
> >>
> >> On Mon, Oct 26, 2015 at 12:20 AM, Ralph Goers
> >><ra...@dslextreme.com>
> >> wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > Ralph
> >> >
> >> > > On Oct 23, 2015, at 7:11 AM, Manoharan, Arun <ar...@ebay.com>
> >> > wrote:
> >> > >
> >> > > Hello Everyone,
> >> > >
> >> > > Thanks for all the feedback on the Eagle Proposal.
> >> > >
> >> > > I would like to call for a [VOTE] on Eagle joining the ASF as an
> >> > incubation project.
> >> > >
> >> > > The vote is open for 72 hours:
> >> > >
> >> > > [ ] +1 accept Eagle in the Incubator
> >> > > [ ] ±0
> >> > > [ ] -1 (please give reason)
> >> > >
> >> > > Eagle is a Monitoring solution for Hadoop to instantly identify
> >>access
> >> > to sensitive data, recognize attacks, malicious activities and take
> >> actions
> >> > in real time. Eagle supports a wide variety of policies on HDFS data
> >>and
> >> > Hive. Eagle also provides machine learning models for detecting
> >>anomalous
> >> > user behavior in Hadoop.
> >> > >
> >> > > The proposal is available on the wiki here:
> >> > > https://wiki.apache.org/incubator/EagleProposal
> >> > >
> >> > > The text of the proposal is also available at the end of this email.
> >> > >
> >> > > Thanks for your time and help.
> >> > >
> >> > > Thanks,
> >> > > Arun
> >> > >
> >> > > <COPY of the proposal in text format>
> >> > >
> >> > > Eagle
> >> > >
> >> > > Abstract
> >> > > Eagle is an Open Source Monitoring solution for Hadoop to instantly
> >> > identify access to sensitive data, recognize attacks, malicious
> >> activities
> >> > in hadoop and take actions.
> >> > >
> >> > > Proposal
> >> > > Eagle audits access to HDFS files, Hive and HBase tables in real
> >>time,
> >> > enforces policies defined on sensitive data access and alerts or
> >>blocks
> >> > user’s access to that sensitive data in real time. Eagle also creates
> >> user
> >> > profiles based on the typical access behaviour for HDFS and Hive and
> >> sends
> >> > alerts when anomalous behaviour is detected. Eagle can also import
> >> > sensitive data information classified by external classification
> >>engines
> >> to
> >> > help define its policies.
> >> > >
> >> > > Overview of Eagle
> >> > > Eagle has 3 main parts.
> >> > > 1.Data collection and storage - Eagle collects data from various
> >>hadoop
> >> > logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> >> storage.
> >> > > 2.Data processing and policy engine - Eagle allows users to create
> >> > policies based on various metadata properties on HDFS, Hive and HBase
> >> data.
> >> > > 3.Eagle services - Eagle services include policy manager, query
> >>service
> >> > and the visualization component. Eagle provides intuitive user
> >>interface
> >> to
> >> > administer Eagle and an alert dashboard to respond to real time
> >>alerts.
> >> > >
> >> > > Data Collection and Storage:
> >> > > Eagle provides programming API for extending Eagle to integrate any
> >> data
> >> > source into Eagle policy evaluation framework. For example, Eagle hdfs
> >> > audit monitoring collects data from Kafka which is populated from
> >> namenode
> >> > log4j appender or from logstash agent. Eagle hive monitoring collects
> >> hive
> >> > query logs from running job through YARN API, which is designed to be
> >> > scalable and fault-tolerant. Eagle uses HBase as storage for storing
> >> > metadata and metrics data, and also supports relational database
> >>through
> >> > configuration change.
> >> > >
> >> > > Data Processing and Policy Engine:
> >> > > Processing Engine: Eagle provides stream processing API which is an
> >> > abstraction of Apache Storm. It can also be extended to other
> >>streaming
> >> > engines. This abstraction allows developers to assemble data
> >> > transformation, filtering, external data join etc. without physically
> >> bound
> >> > to a specific streaming platform. Eagle streaming API allows
> >>developers
> >> to
> >> > easily integrate business logic with Eagle policy engine and
> >>internally
> >> > Eagle framework compiles business logic execution DAG into program
> >> > primitives of underlying stream infrastructure e.g. Apache Storm. For
> >> > example, Eagle HDFS monitoring transforms audit log from Namenode to
> >> object
> >> > and joins sensitivity metadata, security zone metadata which are
> >> generated
> >> > from external programs or configured by user. Eagle hive monitoring
> >> filters
> >> > running jobs to get hive query string and parses query string into
> >>object
> >> > and then joins sensitivity metadata.
> >> > > Alerting Framework: Eagle Alert Framework includes stream metadata
> >>API,
> >> > scalable policy engine framework, extensible policy engine framework.
> >> > Stream metadata API allows developers to declare event schema
> >>including
> >> > what attributes constitute an event, what is the type for each
> >>attribute,
> >> > and how to dynamically resolve attribute value in runtime when user
> >> > configures policy. Scalable policy engine framework allows policies
> >>to be
> >> > executed on different physical nodes in parallel. It is also used to
> >> define
> >> > your own policy partitioner class. Policy engine framework together
> >>with
> >> > streaming partitioning capability provided by all streaming platforms
> >> will
> >> > make sure policies and events can be evaluated in a fully distributed
> >> way.
> >> > Extensible policy engine framework allows developer to plugin a new
> >> policy
> >> > engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> >> > engine which Eagle supports as first-class citizen.
> >> > > Machine Learning module: Eagle provides capabilities to define user
> >> > activity patterns or user profiles for Hadoop users based on the user
> >> > behaviour in the platform. These user profiles are modeled using
> >>Machine
> >> > Learning algorithms and used for detection of anomalous users
> >>activities.
> >> > Eagle uses Eigen Value Decomposition, and Density Estimation
> >>algorithms
> >> for
> >> > generating user profile models. The model reads data from HDFS audit
> >> logs,
> >> > preprocesses and aggregates data, and generates models using Spark
> >> > programming APIs. Once models are generated, Eagle uses stream
> >>processing
> >> > engine for near real-time anomaly detection to determine if any user’s
> >> > activities are suspicious or not.
> >> > >
> >> > > Eagle Services:
> >> > > Query Service: Eagle provides SQL-like service API to support
> >> > comprehensive computation for huge set of data on the fly, for e.g.
> >> > comprehensive filtering, aggregation, histogram, sorting, top,
> >> arithmetical
> >> > expression, pagination etc. HBase is the data storage which Eagle
> >> supports
> >> > as first-class citizen, relational database is supported as well. For
> >> HBase
> >> > storage, Eagle query framework compiles user provided SQL-like query
> >>into
> >> > HBase native filter objects and execute it through HBase coprocessor
> >>on
> >> the
> >> > fly.
> >> > > Policy Manager: Eagle policy manager provides UI and Restful API for
> >> > user to define policy with just a few clicks. It includes site
> >>management
> >> > UI, policy editor, sensitivity metadata import, HDFS or Hive sensitive
> >> > resource browsing, alert dashboards etc.
> >> > > Background
> >> > > Data is one of the most important assets for today’s businesses,
> >>which
> >> > makes data security one of the top priorities of today’s enterprises.
> >> > Hadoop is widely used across different verticals as a big data
> >>repository
> >> > to store this data in most modern enterprises.
> >> > > At eBay we use hadoop platform extensively for our data processing
> >> > needs. Our data in Hadoop is becoming bigger and bigger as our user
> >>base
> >> is
> >> > seeing an exponential growth. Today there are variety of data sets
> >> > available in Hadoop cluster for our users to consume. eBay has around
> >>120
> >> > PB of data stored in HDFS across 6 different clusters and around 1800+
> >> > active hadoop users consuming data thru Hive, HBase and mapreduce jobs
> >> > everyday to build applications using this data. With this astronomical
> >> > growth of data there are also challenges in securing sensitive data
> >>and
> >> > monitoring the access to this sensitive data. Today in large
> >> organizations
> >> > HDFS is the defacto standard for storing big data. Data sets which
> >> includes
> >> > and not limited to consumer sentiment, social media data, customer
> >> > segmentation, web clicks, sensor data, geo-location and transaction
> >>data
> >> > get stored in Hadoop for day to day business needs.
> >> > > We at eBay want to make sure the sensitive data and data platforms
> >>are
> >> > completely protected from security breaches. So we partnered very
> >>closely
> >> > with our Information Security team to understand the requirements for
> >> Eagle
> >> > to monitor sensitive data access on hadoop:
> >> > > 1.Ability to identify and stop security threats in real time
> >> > > 2.Scale for big data (Support PB scale and Billions of events)
> >> > > 3.Ability to create data access policies
> >> > > 4.Support multiple data sources like HDFS, HBase, Hive
> >> > > 5.Visualize alerts in real time
> >> > > 6.Ability to block malicious access in real time
> >> > > We did not find any data access monitoring solution that available
> >> today
> >> > and can provide the features and functionality that we need to monitor
> >> the
> >> > data access in the hadoop ecosystem at our scale. Hence with an
> >>excellent
> >> > team of world class developers and several users, we have been able to
> >> > bring Eagle into production as well as open source it.
> >> > >
> >> > > Rationale
> >> > > In today’s world; data is an important asset for any company.
> >> Businesses
> >> > are using data extensively to create amazing experiences for users.
> >>Data
> >> > has to be protected and access to data should be secured from security
> >> > breaches. Today Hadoop is not only used to store logs but also stores
> >> > financial data, sensitive data sets, geographical data, user click
> >>stream
> >> > data sets etc. which makes it more important to be protected from
> >> security
> >> > breaches. To secure a data platform there are multiple things that
> >>need
> >> to
> >> > happen. One is having a strong access control mechanism which today is
> >> > provided by Apache Ranger and Apache Sentry. These tools provide the
> >> > ability to provide fine grain access control mechanism to data sets on
> >> > hadoop. But there is a big gap in terms of monitoring all the data
> >>access
> >> > events and activities in order to securing the hadoop data platform.
> >> > Together with strong access control, perimeter security and data
> >>access
> >> > monitoring in place data in the hadoop clusters can be secured against
> >> > breaches. We looked around and found following:
> >> > > Existing data activity monitoring products are designed for
> >>traditional
> >> > databases and data warehouse. Existing monitoring platforms cannot
> >>scale
> >> > out to support fast growing data and petabyte scale. Few products in
> >>the
> >> > industry are still very early in terms of supporting HDFS, Hive, HBase
> >> data
> >> > access monitoring.
> >> > > As mentioned in the background, the business requirement and
> >>urgency to
> >> > secure the data from users with malicious intent drove eBay to invest
> >>in
> >> > building a real time data access monitoring solution from scratch to
> >> offer
> >> > real time alerts and remediation features for malicious data access.
> >> > > With the power of open source distributed systems like Hadoop, Kafka
> >> and
> >> > much more we were able to develop a data activity monitoring system
> >>that
> >> > can scale, identify and stop malicious access in real time.
> >> > > Eagle allows admins to create standard access policies and rules for
> >> > monitoring HDFS, Hive and HBase data. Eagle also provides out of box
> >> > machine learning models for modeling user profiles based on user
> >>access
> >> > behaviour and use the model to alert on anomalies.
> >> > >
> >> > > Current Status
> >> > >
> >> > > Meritocracy
> >> > > Eagle has been deployed in production at eBay for monitoring
> >>billions
> >> of
> >> > events per day from HDFS and Hive operations. From the start; the
> >>product
> >> > has been built with focus on high scalability and application
> >> extensibility
> >> > in mind and Eagle has demonstrated great performance in responding to
> >> > suspicious events instantly and great flexibility in defining policy.
> >> > >
> >> > > Community
> >> > > Eagle seeks to develop the developer and user communities during
> >> > incubation.
> >> > >
> >> > > Core Developers
> >> > > Eagle is currently being designed and developed by engineers from
> >>eBay
> >> > Inc. – Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang,
> >> > Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of
> >> > these core developers have deep expertise in developing monitoring
> >> products
> >> > for the Hadoop ecosystem.
> >> > >
> >> > > Alignment
> >> > > The ASF is a natural host for Eagle given that it is already the
> >>home
> >> of
> >> > Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
> >> > projects. Eagle leverages lot of Apache open-source products. Eagle
> >>was
> >> > designed to offer real time insights into sensitive data access by
> >> actively
> >> > monitoring the data access on various data sets in hadoop and an
> >> extensible
> >> > alerting framework with a powerful policy engine. Eagle compliments
> >>the
> >> > existing Hadoop platform area by providing a comprehensive monitoring
> >>and
> >> > alerting solution for detecting sensitive data access threats based on
> >> > preset policies and machine learning models for user behaviour
> >>analysis.
> >> > >
> >> > > Known Risks
> >> > >
> >> > > Orphaned Products
> >> > > The core developers of Eagle team work full time on this project.
> >>There
> >> > is no risk of Eagle getting orphaned since eBay is extensively using
> >>it
> >> in
> >> > their production Hadoop clusters and have plans to go beyond hadoop.
> >>For
> >> > example, currently there are 7 hadoop clusters and 2 of them are being
> >> > monitored using Hadoop Eagle in production. We have plans to extend
> >>it to
> >> > all hadoop clusters and eventually other data platforms. There are
> >>10’s
> >> of
> >> > policies onboarded and actively monitored with plans to onboard more
> >>use
> >> > case. We are very confident that every hadoop cluster in the world
> >>will
> >> be
> >> > monitored using Eagle for securing the hadoop ecosystem by actively
> >> > monitoring for data access on sensitive data. We plan to extend and
> >> > diversify this community further through Apache. We presented Eagle at
> >> the
> >> > hadoop summit in china and garnered interest from different companies
> >>who
> >> > use hadoop extensively.
> >> > >
> >> > > Inexperience with Open Source
> >> > > The core developers are all active users and followers of open
> >>source.
> >> > They are already committers and contributors to the Eagle Github
> >>project.
> >> > All have been involved with the source code that has been released
> >>under
> >> an
> >> > open source license, and several of them also have experience
> >>developing
> >> > code in an open source environment. Though the core set of Developers
> >>do
> >> > not have Apache Open Source experience, there are plans to onboard
> >> > individuals with Apache open source experience on to the project.
> >>Apache
> >> > Kylin PMC members are also in the same ebay organization. We work very
> >> > closely with Apache Ranger committers and are looking forward to find
> >> > meaningful integrations to improve the security of hadoop platform.
> >> > >
> >> > > Homogenous Developers
> >> > > The core developers are from eBay. Today the problem of monitoring
> >>data
> >> > activities to find and stop threats is a universal problem faced by
> >>all
> >> the
> >> > businesses. Apache Incubation process encourages an open and diverse
> >> > meritocratic community. Eagle intends to make every possible effort to
> >> > build a diverse, vibrant and involved community and has already
> >>received
> >> > substantial interest from various organizations.
> >> > >
> >> > > Reliance on Salaried Developers
> >> > > eBay invested in Eagle as the monitoring solution for Hadoop
> >>clusters
> >> > and some of its key engineers are working full time on the project. In
> >> > addition, since there is a growing need for securing sensitive data
> >> access
> >> > we need a data activity monitoring solution for Hadoop, we look
> >>forward
> >> to
> >> > other Apache developers and researchers to contribute to the project.
> >> > Additional contributors, including Apache committers have plans to
> >>join
> >> > this effort shortly. Also key to addressing the risk associated with
> >> > relying on Salaried developers from a single entity is to increase the
> >> > diversity of the contributors and actively lobby for Domain experts in
> >> the
> >> > security space to contribute. Eagle intends to do this.
> >> > >
> >> > > Relationships with Other Apache Products
> >> > > Eagle has a strong relationship and dependency with Apache Hadoop,
> >> > HBase, Spark, Kafka and Storm. Being part of Apache’s Incubation
> >> community,
> >> > could help with a closer collaboration among these projects and as
> >>well
> >> as
> >> > others. An Excessive Fascination with the Apache Brand Eagle is
> >>proposing
> >> > to enter incubation at Apache in order to help efforts to diversify
> >>the
> >> > committer-base, not so much to capitalize on the Apache brand. The
> >>Eagle
> >> > project is in production use already inside eBay, but is not expected
> >>to
> >> be
> >> > an eBay product for external customers. As such, the Eagle project is
> >>not
> >> > seeking to use the Apache brand as a marketing tool.
> >> > >
> >> > > Documentation
> >> > > Information about Eagle can be found at
> >>https://github.com/eBay/Eagle.
> >> > The following link provide more information about Eagle
> >> http://goeagle.io<
> >> > http://goeagle.io/>.
> >> > >
> >> > > Initial Source
> >> > > Eagle has been under development since 2014 by a team of engineers
> >>at
> >> > eBay Inc. It is currently hosted on Github.com under an Apache license
> >> 2.0
> >> > at https://github.com/eBay/Eagle. Once in incubation we will be
> moving
> >> > the code base to apache git library.
> >> > >
> >> > > External Dependencies
> >> > > Eagle has the following external dependencies.
> >> > > Basic
> >> > > •JDK 1.7+
> >> > > •Scala 2.10.4
> >> > > •Apache Maven
> >> > > •JUnit
> >> > > •Log4j
> >> > > •Slf4j
> >> > > •Apache Commons
> >> > > •Apache Commons Math3
> >> > > •Jackson
> >> > > •Siddhi CEP engine
> >> > >
> >> > > Hadoop
> >> > > •Apache Hadoop
> >> > > •Apache HBase
> >> > > •Apache Hive
> >> > > •Apache Zookeeper
> >> > > •Apache Curator
> >> > >
> >> > > Apache Spark
> >> > > •Spark Core Library
> >> > >
> >> > > REST Service
> >> > > •Jersey
> >> > >
> >> > > Query
> >> > > •Antlr
> >> > >
> >> > > Stream processing
> >> > > •Apache Storm
> >> > > •Apache Kafka
> >> > >
> >> > > Web
> >> > > •AngularJS
> >> > > •jQuery
> >> > > •Bootstrap V3
> >> > > •Moment JS
> >> > > •Admin LTE
> >> > > •html5shiv
> >> > > •respond
> >> > > •Fastclick
> >> > > •Date Range Picker
> >> > > •Flot JS
> >> > >
> >> > > Cryptography
> >> > > Eagle will eventually support encryption on the wire. This is not
> >>one
> >> of
> >> > the initial goals, and we do not expect Eagle to be a controlled
> >>export
> >> > item due to the use of encryption. Eagle supports but does not require
> >> the
> >> > Kerberos authentication mechanism to access secured Hadoop services.
> >> > >
> >> > > Required Resources
> >> > >
> >> > > Mailing List
> >> > > •eagle-private for private PMC discussions
> >> > > •eagle-dev for developers
> >> > > •eagle-commits for all commits
> >> > > •eagle-users for all eagle users
> >> > >
> >> > > Subversion Directory
> >> > > •Git is the preferred source control system.
> >> > >
> >> > > Issue Tracking
> >> > > •JIRA Eagle (Eagle)
> >> > >
> >> > > Other Resources
> >> > > The existing code already has unit tests so we will make use of
> >> existing
> >> > Apache continuous testing infrastructure. The resulting load should
> >>not
> >> be
> >> > very large.
> >> > >
> >> > > Initial Committers
> >> > > •Seshu Adunuthula <sadunuthula at ebay dot com>
> >> > > •Arun Manoharan <armanoharan at ebay dot com>
> >> > > •Edward Zhang <yonzhang at ebay dot com>
> >> > > •Hao Chen <hchen9 at ebay dot com>
> >> > > •Chaitali Gupta <cgupta at ebay dot com>
> >> > > •Libin Sun <libsun at ebay dot com>
> >> > > •Jilin Jiang <jiljiang at ebay dot com>
> >> > > •Qingwen Zhao <qingwzhao at ebay dot com>
> >> > > •Hemanth Dendukuri <hdendukuri at ebay dot com>
> >> > > •Senthil Kumar <senthilkumar at ebay dot com>
> >> > >
> >> > >
> >> > > Affiliations
> >> > > The initial committers are employees of eBay Inc.
> >> > >
> >> > > Sponsors
> >> > >
> >> > > Champion
> >> > > •Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> >> > >
> >> > > Nominated Mentors
> >> > > •Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> >> > Hortonworks
> >> > > •Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> >> > > •Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
> >> > Hortonworks
> >> > > •Amareshwari Sriramdasu <amareshwari at apache dot org> - Apache
> >>IPMC
> >> > member
> >> > > •Taylor Goetz <ptgoetz at apache dot org> - Apache IPMC member,
> >> > Hortonworks
> >> > >
> >> > > Sponsoring Entity
> >> > > We are requesting the Incubator to sponsor this project.
> >> > >
> >> >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> > For additional commands, e-mail: general-help@incubator.apache.org
> >> >
> >> >
> >>
> >>
> >> --
> >> Regards,
> >>
> >> *Bin Mahone | 马洪宾*
> >> Apache Kylin: http://kylin.io
> >> Github: https://github.com/binmahone
>
>