You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@eagle.apache.org by ha...@apache.org on 2017/04/03 11:29:31 UTC
svn commit: r1789961 [2/5] - in /eagle/site/docs: ./ latest/ latest/include/images/ latest/mkdocs/ latest/mkdocs/js/ v0.5.0/ v0.5.0/include/images/ v0.5.0/mkdocs/ v0.5.0/mkdocs/js/

Added: eagle/site/docs/latest/mkdocs/search_index.json
URL: http://svn.apache.org/viewvc/eagle/site/docs/latest/mkdocs/search_index.json?rev=1789961&view=auto
==============================================================================
--- eagle/site/docs/latest/mkdocs/search_index.json (added)
+++ eagle/site/docs/latest/mkdocs/search_index.json Mon Apr  3 11:29:31 2017
@@ -0,0 +1,739 @@
+{
+    "docs": [
+        {
+            "location": "/", 
+            "text": "What is Eagle\n\n\n Apache Eagle \n (incubating) is a highly extensible, scalable monitoring and alerting platform, designed with its flexible application framework and proven big data technologies, such as Kafka, Spark and Storm. It ships a rich set of applications for big data platform monitoring, e.g. HDFS/HBase/YARN service health check, JMX metrics, daemon logs, audit logs and yarn applications. External Eagle developers can define applications to monitoring their NoSQLs or Web Servers, and publish to Eagle application repository at your own discretion. It also provides the state-of-art alert engine to report security breaches, service failures, and application anomalies, highly customizable by the alert policy definition. \n\n\n\n\nTerminology\n\n\nSite\n\n\n\n\nA virtual concept in Apache Eagle. You can use it to manage a group of application instances, and distinguish the applications if you have a certain application installed for multiple times.\n\n\n\
 n\nApplication\n\n\n\n\nApplication(or Monitoring Application) is the first-class citizen in Apache Eagle, it stands for an end-to-end monitoring/alerting solution, which usually contains the monitoring source onboarding, source schema specification, alerting policy and dashboard definition.\n\n\n\n\nStream\n\n\n\n\nStream is the input for Alert Engine, each Application should have its own stream to be defined by the developer. Usually, it will have a POJO-like structure included in the stream definition. Once it's defined, Application should have the logic to write data into Kafka.\n\n\n\n\nData Activity Monitoring\n\n\n\n\nA built-in monitoring application to monitor HDFS/HBase/Hive operations, and allow users to define certain policies to detect sensitive data access and malicious data operations in real-time.\n\n\n\n\nAlert Engine\n\n\n\n\nA specific built-in application shared for all other monitoring applications, it reads data from Kafka, and processes the data by applying th
 e policy in real-time manner, and generates alert notification. So we call this application as the Alert Engine.\n\n\n\n\nPolicy\n\n\n\n\nA rule used by Alert Engine to match the data input from Kafka. Policy is defined in \nSiddhiQL\n format.\n\n\n\n\nAlert\n\n\n\n\nIf any data input to Alert Engine meets the policy, the Alert Engine will generate a message and publish it through alert publisher. We call such messages as the alerts.\n\n\n\n\nAlert Publisher\n\n\n\n\nIt will publish the alert to external channels which can be the SMTP channel, the Kafka channel, Slack channel or other storage systems.\n\n\n\n\nKey Qualities\n\n\nExtensible\n\n\n\n\nApache Eagle built its core framework around the application concept, application itself includes the logic for monitoring source data collection, pre-processing and normalization. Developer can easily develop his own out-of-box monitoring applications using Eagle's application framework, and deploy into Eagle.\n\n\n\n\nScalable\n\n\n\n\n
 The Eagle core team has chosen the proven big data technologies to build its fundamental runtime, and apply a scalable core to make it adaptive according to the throughput of data stream as well as the number of monitored applications.\n\n\n\n\nReal-time\n\n\n\n\nStorm or Spark Streaming based computing engine allow us to apply the policy to data stream and generate alerts in real-time manner.\n\n\n\n\nDynamic\n\n\n\n\nThe user can freely enable or disable a monitoring application without restarting the service. Eagle user can dynamically add/delet/change their alert policies without any impact to the underlying runtime.\n\n\n\n\nEasy-of-Use\n\n\n\n\nUser can enable the monitoring for a service within minutes effort by just choosing the corresponding monitoring application and configuring few parameters for the service.\n\n\n\n\nNon-Invasive\n\n\n\n\nApache Eagle uses the out-of-box applications to monitor services, you don't need any change to your existing services.\n\n\n\n\n\n\nU
 se Case Examples\n\n\nData Activity Monitoring\n\n\n\n\n\n\nData activity represents how user explores data provided by big data platforms. Analyzing data activity and alerting for insecure access are fundamental requirements for securing enterprise data. As data volume is increasing exponentially with Hadoop, Hive, Spark technology, understanding data activities for every user becomes extremely hard, let alone to alert for a single malicious event in real time among petabytes streaming data per day.\n\n\n\n\n\n\nSecuring enterprise data starts from understanding data activities for every user. Apache Eagle (incubating, called Eagle in the following) has integrated with many popular big data platforms e.g. Hadoop, Hive, Spark, Cassandra etc. With Eagle user can browse data hierarchy, mark sensitive data and then create comprehensive policy to alert for insecure data access.\n\n\n\n\n\n\nJob Performance Analysis\n\n\n\n\n\n\nRunning map/reduce job is the most popular way people use t
 o analyze data in Hadoop system. Analyzing job performance and providing tuning suggestions are critical for Hadoop system stability, job SLA and resource usage etc.\n\n\n\n\n\n\nEagle analyzes job performance with two complementing approaches. First Eagle periodically takes snapshots for all running jobs with YARN API, secondly Eagle continuously reads job lifecycle events immediately after the job is completed. With the two approaches, Eagle can analyze single job's trend, data skew problem, failure reasons etc. More interestingly, Eagle can analyze whole Hadoop cluster's performance by taking into account all jobs.\n\n\n\n\n\n\nCluster Performance Analytics\n\n\n\n\n\n\nIt is critical to understand why a cluster performs bad. Is that because of some crazy jobs recently on-boarded, or huge amount of tiny files, or namenode performance degrading?\n\n\n\n\n\n\nEagle in realtime calculates resource usage per minute out of individual jobs, e.g. CPU, memory, HDFS IO bytes, HDFS IO numO
 ps etc. and also collects namenode JMX metrics. Correlating them together will easily help system administrator find root cause for cluster slowness.\n\n\n\n\n\n\n\n\nDisclaimer\n\n\n\n\nApache Eagle now is being incubated, and therefore, across the whole documentation site, all appearances of case-insensitive word \neagle\n and \napache eagle\n represent \nApache Eagle (incubating)\n. This could be seen as a part of disclaimer.", 
+            "title": "Home"
+        }, 
+        {
+            "location": "/#what-is-eagle", 
+            "text": "Apache Eagle   (incubating) is a highly extensible, scalable monitoring and alerting platform, designed with its flexible application framework and proven big data technologies, such as Kafka, Spark and Storm. It ships a rich set of applications for big data platform monitoring, e.g. HDFS/HBase/YARN service health check, JMX metrics, daemon logs, audit logs and yarn applications. External Eagle developers can define applications to monitoring their NoSQLs or Web Servers, and publish to Eagle application repository at your own discretion. It also provides the state-of-art alert engine to report security breaches, service failures, and application anomalies, highly customizable by the alert policy definition.", 
+            "title": "What is Eagle"
+        }, 
+        {
+            "location": "/#terminology", 
+            "text": "", 
+            "title": "Terminology"
+        }, 
+        {
+            "location": "/#site", 
+            "text": "A virtual concept in Apache Eagle. You can use it to manage a group of application instances, and distinguish the applications if you have a certain application installed for multiple times.", 
+            "title": "Site"
+        }, 
+        {
+            "location": "/#application", 
+            "text": "Application(or Monitoring Application) is the first-class citizen in Apache Eagle, it stands for an end-to-end monitoring/alerting solution, which usually contains the monitoring source onboarding, source schema specification, alerting policy and dashboard definition.", 
+            "title": "Application"
+        }, 
+        {
+            "location": "/#stream", 
+            "text": "Stream is the input for Alert Engine, each Application should have its own stream to be defined by the developer. Usually, it will have a POJO-like structure included in the stream definition. Once it's defined, Application should have the logic to write data into Kafka.", 
+            "title": "Stream"
+        }, 
+        {
+            "location": "/#data-activity-monitoring", 
+            "text": "A built-in monitoring application to monitor HDFS/HBase/Hive operations, and allow users to define certain policies to detect sensitive data access and malicious data operations in real-time.", 
+            "title": "Data Activity Monitoring"
+        }, 
+        {
+            "location": "/#alert-engine", 
+            "text": "A specific built-in application shared for all other monitoring applications, it reads data from Kafka, and processes the data by applying the policy in real-time manner, and generates alert notification. So we call this application as the Alert Engine.", 
+            "title": "Alert Engine"
+        }, 
+        {
+            "location": "/#policy", 
+            "text": "A rule used by Alert Engine to match the data input from Kafka. Policy is defined in  SiddhiQL  format.", 
+            "title": "Policy"
+        }, 
+        {
+            "location": "/#alert", 
+            "text": "If any data input to Alert Engine meets the policy, the Alert Engine will generate a message and publish it through alert publisher. We call such messages as the alerts.", 
+            "title": "Alert"
+        }, 
+        {
+            "location": "/#alert-publisher", 
+            "text": "It will publish the alert to external channels which can be the SMTP channel, the Kafka channel, Slack channel or other storage systems.", 
+            "title": "Alert Publisher"
+        }, 
+        {
+            "location": "/#key-qualities", 
+            "text": "", 
+            "title": "Key Qualities"
+        }, 
+        {
+            "location": "/#extensible", 
+            "text": "Apache Eagle built its core framework around the application concept, application itself includes the logic for monitoring source data collection, pre-processing and normalization. Developer can easily develop his own out-of-box monitoring applications using Eagle's application framework, and deploy into Eagle.", 
+            "title": "Extensible"
+        }, 
+        {
+            "location": "/#scalable", 
+            "text": "The Eagle core team has chosen the proven big data technologies to build its fundamental runtime, and apply a scalable core to make it adaptive according to the throughput of data stream as well as the number of monitored applications.", 
+            "title": "Scalable"
+        }, 
+        {
+            "location": "/#real-time", 
+            "text": "Storm or Spark Streaming based computing engine allow us to apply the policy to data stream and generate alerts in real-time manner.", 
+            "title": "Real-time"
+        }, 
+        {
+            "location": "/#dynamic", 
+            "text": "The user can freely enable or disable a monitoring application without restarting the service. Eagle user can dynamically add/delet/change their alert policies without any impact to the underlying runtime.", 
+            "title": "Dynamic"
+        }, 
+        {
+            "location": "/#easy-of-use", 
+            "text": "User can enable the monitoring for a service within minutes effort by just choosing the corresponding monitoring application and configuring few parameters for the service.", 
+            "title": "Easy-of-Use"
+        }, 
+        {
+            "location": "/#non-invasive", 
+            "text": "Apache Eagle uses the out-of-box applications to monitor services, you don't need any change to your existing services.", 
+            "title": "Non-Invasive"
+        }, 
+        {
+            "location": "/#use-case-examples", 
+            "text": "", 
+            "title": "Use Case Examples"
+        }, 
+        {
+            "location": "/#data-activity-monitoring_1", 
+            "text": "Data activity represents how user explores data provided by big data platforms. Analyzing data activity and alerting for insecure access are fundamental requirements for securing enterprise data. As data volume is increasing exponentially with Hadoop, Hive, Spark technology, understanding data activities for every user becomes extremely hard, let alone to alert for a single malicious event in real time among petabytes streaming data per day.    Securing enterprise data starts from understanding data activities for every user. Apache Eagle (incubating, called Eagle in the following) has integrated with many popular big data platforms e.g. Hadoop, Hive, Spark, Cassandra etc. With Eagle user can browse data hierarchy, mark sensitive data and then create comprehensive policy to alert for insecure data access.", 
+            "title": "Data Activity Monitoring"
+        }, 
+        {
+            "location": "/#job-performance-analysis", 
+            "text": "Running map/reduce job is the most popular way people use to analyze data in Hadoop system. Analyzing job performance and providing tuning suggestions are critical for Hadoop system stability, job SLA and resource usage etc.    Eagle analyzes job performance with two complementing approaches. First Eagle periodically takes snapshots for all running jobs with YARN API, secondly Eagle continuously reads job lifecycle events immediately after the job is completed. With the two approaches, Eagle can analyze single job's trend, data skew problem, failure reasons etc. More interestingly, Eagle can analyze whole Hadoop cluster's performance by taking into account all jobs.", 
+            "title": "Job Performance Analysis"
+        }, 
+        {
+            "location": "/#cluster-performance-analytics", 
+            "text": "It is critical to understand why a cluster performs bad. Is that because of some crazy jobs recently on-boarded, or huge amount of tiny files, or namenode performance degrading?    Eagle in realtime calculates resource usage per minute out of individual jobs, e.g. CPU, memory, HDFS IO bytes, HDFS IO numOps etc. and also collects namenode JMX metrics. Correlating them together will easily help system administrator find root cause for cluster slowness.", 
+            "title": "Cluster Performance Analytics"
+        }, 
+        {
+            "location": "/#disclaimer", 
+            "text": "Apache Eagle now is being incubated, and therefore, across the whole documentation site, all appearances of case-insensitive word  eagle  and  apache eagle  represent  Apache Eagle (incubating) . This could be seen as a part of disclaimer.", 
+            "title": "Disclaimer"
+        }, 
+        {
+            "location": "/getting-started/", 
+            "text": "Architecture\n\n\n\n\nEagle Apps\n\n\n\n\nSecurity\n\n\nHadoop\n\n\nOperational Intelligence\n\n\n\n\nFor more applications, see \nApplications\n.\n\n\nEagle Interface\n\n\n\n\nREST Service\n\n\nManagement UI\n\n\nCustomizable Analytics Visualization\n\n\n\n\nEagle Integration\n\n\n\n\nApache Ambari\n\n\nDocker\n\n\nApache Ranger\n\n\nDataguise\n\n\n\n\nEagle Framework\n\n\nEagle has multiple distributed real-time frameworks for efficiently developing highly scalable monitoring applications.\n\n\nAlert Engine\n\n\n\n\n\n\nReal-time: Apache Storm (Execution Engine) + Kafka (Message Bus)\n\n\n\n\nDeclarative Policy: SQL (CEP) on Streaming\n        from hadoopJmxMetricEventStream\n        [metric == \"hadoop.namenode.fsnamesystemstate.capacityused\" and value \n 0.9] \n        select metric, host, value, timestamp, component, site \n        insert into alertStream;\n\n\n\n\n\n\nDynamical onboarding \n correlation\n\n\n\n\nNo downtime migration and upgrading\n\n\n\n
 \nStorage Engine\n\n\n\n\n\n\n\n\nLight-weight ORM Framework for HBase/RDMBS\n\n\n@Table(\"HbaseTableName\")\n@ColumnFamily(\"ColumnFamily\")\n@Prefix(\"RowkeyPrefix\")\n@Service(\"UniqueEntitytServiceName\")\n@JsonIgnoreProperties(ignoreUnknown = true)\n@TimeSeries(false)\n@Indexes({\n    @Index(name=\"Index_1_alertExecutorId\", columns = { \"alertExecutorID\" }, unique = true)})\npublic class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{\n@Column(\"a\")\nprivate String desc;\n\n\n\n\n\n\n\nFull-function SQL-Like REST Query \n\n\nQuery=UniqueEntitytServiceName[@site=\"sandbox\"]{*}\n\n\n\n\n\n\n\nOptimized Rowkey design for time-series data, optimized for metric/entity/log, etc. different storage types\n\n\nRowkey ::= Prefix | Partition Keys | timestamp | tagName | tagValue | \u2026\n\n\n\n\n\n\n\nSecondary Index Support\n        @Indexes(, unique = true/false)})\n\n\n\n\n\n\nNative HBase Coprocessor\n        org.apache.eagle.storage.hbase.query.coprocessor.AggregateProtocol
 EndPoint\n\n\n\n\n\n\nUI Framework\n\n\nEagle UI is consist of following parts:\n\n\n\n\nEagle Main UI\n\n\nEagle App Portal/Dashboard/Widgets\n\n\nEagle Customized Dashboard \n\n\n\n\nApplication Framework\n\n\nApplication\n\n\nAn \"Application\" or \"App\" is composed of data integration, policies and insights for one data source.\n\n\nApplication Descriptor\n\n\nAn \"Application Descriptor\" is a static packaged metadata information consist of basic information like type, name, version, description, and application process, configuration, streams, docs, policies and so on. \n\n\nHere is an example ApplicationDesc of \nJPM_WEB_APP\n\n\n    {\n    type: \"JPM_WEB_APP\",\n    name: \"Job Performance Monitoring Web \",\n    version: \"0.5.0-incubating\",\n    description: null,\n    appClass: \"org.apache.eagle.app.StaticApplication\",\n    jarPath: \"/opt/eagle/0.5.0-incubating-SNAPSHOT-build-20161103T0332/eagle-0.5.0-incubating-SNAPSHOT/lib/eagle-topology-0.5.0-incubating-SNAPSHOT-
 hadoop-2.4.1-11-assembly.jar\",\n    viewPath: \"/apps/jpm\",\n    providerClass: \"org.apache.eagle.app.jpm.JPMWebApplicationProvider\",\n    configuration: {\n        properties: [{\n            name: \"service.host\",\n            displayName: \"Eagle Service Host\",\n            value: \"localhost\",\n            description: \"Eagle Service Host, default: localhost\",\n            required: false\n        }, {\n            name: \"service.port\",\n            displayName: \"Eagle Service Port\",\n            value: \"8080\",\n            description: \"Eagle Service Port, default: 8080\",\n            required: false\n        }]\n    },\n    streams: null,\n    docs: null,\n    executable: false,\n    dependencies: [{\n        type: \"MR_RUNNING_JOB_APP\",\n        version: \"0.5.0-incubating\",\n        required: true\n    }, {\n        type: \"MR_HISTORY_JOB_APP\",\n        version: \"0.5.0-incubating\",\n        required: true\n    }]\n    }\n\n\n\nApplication Provider\n\n\n
 Appilcation Provider is a package management and loading mechanism leveraging \nJava SPI\n.\n\n\nFor example, in file \nMETA-INF/services/org.apache.eagle.app.spi.ApplicationProvider\n, place the full class name of an application provider:\n\n\norg.apache.eagle.app.jpm.JPMWebApplicationProvider\n\n\n\n\n\nConcepts\n\n\n\n\nHere are some terms we are using in Apache Eagle (incubating, called Eagle in the following), please check them for your reference. They are basic knowledge of Eagle which also will help to well understand Eagle.\n\n\n\n\nSite\n\n\n\n\nA site can be considered as a physical data center. Big data platform e.g. Hadoop may be deployed to multiple data centers in an enterprise.\n\n\n\n\nApplication\n\n\n\n\nAn \"Application\" or \"App\" is composed of data integration, policies and insights for one data source.\n\n\n\n\nPolicy\n\n\n\n\nA \"Policy\" defines the rule to alert. Policy can be simply a filter expression or a complex window based aggregation rules etc.\n\n\
 n\n\nAlerts\n\n\n\n\nAn \"Alert\" is an real-time event detected with certain alert policy or correlation logic, with different severity levels like INFO/WARNING/DANGER.\n\n\n\n\nData Source\n\n\n\n\nA \"Data Source\" is a monitoring target data. Eagle supports many data sources HDFS audit logs, Hive2 query, MapReduce job etc.\n\n\n\n\nStream\n\n\n\n\nA \"Stream\" is the streaming data from a data source. Each data source has its own stream.\n\n\n\n\n\n\nQuick Start\n\n\nDeployment\n\n\nPrerequisites\n\n\nEagle requires the following dependencies:\n\n\n\n\nFor streaming platform dependencies\n\n\nStorm: 0.9.3 or later\n\n\nHadoop: 2.6.x or later\n\n\nHbase: 0.98.x or later\n\n\nKafka: 0.8.x or later\n\n\nZookeeper: 3.4.6 or later\n\n\nJava: 1.8.x\n\n\n\n\n\n\nFor metadata database dependencies (Choose one of them)\n\n\nMangoDB 3.2.2 or later\n\n\nInstallation is required\n\n\n\n\n\n\nMysql 5.1.x or later\n\n\nInstallation is required\n\n\n\n\n\n\n\n\n\n\n\n\nNotice:  \n\n\n\n\nStorm
  0.9.x does NOT support JDK8. You can replace asm-4.0.jar with asm-all-5.0.jar in the storm lib directory. \nThen restart other services(nimbus/ui/supervisor).\n\n\n\n\n\nInstallation\n\n\nBuild Eagle\n\n\n\n\n\n\nDownload the latest version of Eagle source code.\n\n\ngit clone https://github.com/apache/incubator-eagle.git\n\n\n\n\n\n\n\nBuild the source code, and a tar.gz package will be generated under eagle-server-assembly/target\n\n\nmvn clean install -DskipTests\n\n\n\n\n\n\n\nDeploy Eagle\n\n\n\n\nCopy binary package to your server machine. In the package, you should find:\n\n\nbin/\n: scripts used for start eagle server\n\n\nconf/\n: default configurations for eagle server setup.\n\n\nlib/\n : all included software packages for eagle server\n\n\n\n\n\n\nChange configurations under \nconf/\n\n\neagle.conf\n\n\nserver.yml\n\n\n\n\n\n\n\n\nRun eagle-server.sh\n\n\n./bin/eagle-server.sh start\n\n\n\n\n\n\n\nCheck eagle server\n\n\n\n\nVisit http://host:port/ in your web browser.\
 n\n\n\n\n\n\n\n\nSetup Your Monitoring Case\n\n\nPlaceholder for topic: Setup Your Monitoring Case", 
+            "title": "Getting Started"
+        }, 
+        {
+            "location": "/getting-started/#architecture", 
+            "text": "", 
+            "title": "Architecture"
+        }, 
+        {
+            "location": "/getting-started/#eagle-apps", 
+            "text": "Security  Hadoop  Operational Intelligence   For more applications, see  Applications .", 
+            "title": "Eagle Apps"
+        }, 
+        {
+            "location": "/getting-started/#eagle-interface", 
+            "text": "REST Service  Management UI  Customizable Analytics Visualization", 
+            "title": "Eagle Interface"
+        }, 
+        {
+            "location": "/getting-started/#eagle-integration", 
+            "text": "Apache Ambari  Docker  Apache Ranger  Dataguise", 
+            "title": "Eagle Integration"
+        }, 
+        {
+            "location": "/getting-started/#eagle-framework", 
+            "text": "Eagle has multiple distributed real-time frameworks for efficiently developing highly scalable monitoring applications.", 
+            "title": "Eagle Framework"
+        }, 
+        {
+            "location": "/getting-started/#alert-engine", 
+            "text": "Real-time: Apache Storm (Execution Engine) + Kafka (Message Bus)   Declarative Policy: SQL (CEP) on Streaming\n        from hadoopJmxMetricEventStream\n        [metric == \"hadoop.namenode.fsnamesystemstate.capacityused\" and value   0.9] \n        select metric, host, value, timestamp, component, site \n        insert into alertStream;    Dynamical onboarding   correlation   No downtime migration and upgrading", 
+            "title": "Alert Engine"
+        }, 
+        {
+            "location": "/getting-started/#storage-engine", 
+            "text": "Light-weight ORM Framework for HBase/RDMBS  @Table(\"HbaseTableName\")\n@ColumnFamily(\"ColumnFamily\")\n@Prefix(\"RowkeyPrefix\")\n@Service(\"UniqueEntitytServiceName\")\n@JsonIgnoreProperties(ignoreUnknown = true)\n@TimeSeries(false)\n@Indexes({\n    @Index(name=\"Index_1_alertExecutorId\", columns = { \"alertExecutorID\" }, unique = true)})\npublic class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{\n@Column(\"a\")\nprivate String desc;    Full-function SQL-Like REST Query   Query=UniqueEntitytServiceName[@site=\"sandbox\"]{*}    Optimized Rowkey design for time-series data, optimized for metric/entity/log, etc. different storage types  Rowkey ::= Prefix | Partition Keys | timestamp | tagName | tagValue | \u2026    Secondary Index Support\n        @Indexes(, unique = true/false)})    Native HBase Coprocessor\n        org.apache.eagle.storage.hbase.query.coprocessor.AggregateProtocolEndPoint", 
+            "title": "Storage Engine"
+        }, 
+        {
+            "location": "/getting-started/#ui-framework", 
+            "text": "Eagle UI is consist of following parts:   Eagle Main UI  Eagle App Portal/Dashboard/Widgets  Eagle Customized Dashboard", 
+            "title": "UI Framework"
+        }, 
+        {
+            "location": "/getting-started/#application-framework", 
+            "text": "", 
+            "title": "Application Framework"
+        }, 
+        {
+            "location": "/getting-started/#application", 
+            "text": "An \"Application\" or \"App\" is composed of data integration, policies and insights for one data source.", 
+            "title": "Application"
+        }, 
+        {
+            "location": "/getting-started/#application-descriptor", 
+            "text": "An \"Application Descriptor\" is a static packaged metadata information consist of basic information like type, name, version, description, and application process, configuration, streams, docs, policies and so on.   Here is an example ApplicationDesc of  JPM_WEB_APP      {\n    type: \"JPM_WEB_APP\",\n    name: \"Job Performance Monitoring Web \",\n    version: \"0.5.0-incubating\",\n    description: null,\n    appClass: \"org.apache.eagle.app.StaticApplication\",\n    jarPath: \"/opt/eagle/0.5.0-incubating-SNAPSHOT-build-20161103T0332/eagle-0.5.0-incubating-SNAPSHOT/lib/eagle-topology-0.5.0-incubating-SNAPSHOT-hadoop-2.4.1-11-assembly.jar\",\n    viewPath: \"/apps/jpm\",\n    providerClass: \"org.apache.eagle.app.jpm.JPMWebApplicationProvider\",\n    configuration: {\n        properties: [{\n            name: \"service.host\",\n            displayName: \"Eagle Service Host\",\n            value: \"localhost\",\n            description: \"Eagle Service Host, de
 fault: localhost\",\n            required: false\n        }, {\n            name: \"service.port\",\n            displayName: \"Eagle Service Port\",\n            value: \"8080\",\n            description: \"Eagle Service Port, default: 8080\",\n            required: false\n        }]\n    },\n    streams: null,\n    docs: null,\n    executable: false,\n    dependencies: [{\n        type: \"MR_RUNNING_JOB_APP\",\n        version: \"0.5.0-incubating\",\n        required: true\n    }, {\n        type: \"MR_HISTORY_JOB_APP\",\n        version: \"0.5.0-incubating\",\n        required: true\n    }]\n    }", 
+            "title": "Application Descriptor"
+        }, 
+        {
+            "location": "/getting-started/#application-provider", 
+            "text": "Appilcation Provider is a package management and loading mechanism leveraging  Java SPI .  For example, in file  META-INF/services/org.apache.eagle.app.spi.ApplicationProvider , place the full class name of an application provider:  org.apache.eagle.app.jpm.JPMWebApplicationProvider", 
+            "title": "Application Provider"
+        }, 
+        {
+            "location": "/getting-started/#concepts", 
+            "text": "Here are some terms we are using in Apache Eagle (incubating, called Eagle in the following), please check them for your reference. They are basic knowledge of Eagle which also will help to well understand Eagle.", 
+            "title": "Concepts"
+        }, 
+        {
+            "location": "/getting-started/#site", 
+            "text": "A site can be considered as a physical data center. Big data platform e.g. Hadoop may be deployed to multiple data centers in an enterprise.", 
+            "title": "Site"
+        }, 
+        {
+            "location": "/getting-started/#application_1", 
+            "text": "An \"Application\" or \"App\" is composed of data integration, policies and insights for one data source.", 
+            "title": "Application"
+        }, 
+        {
+            "location": "/getting-started/#policy", 
+            "text": "A \"Policy\" defines the rule to alert. Policy can be simply a filter expression or a complex window based aggregation rules etc.", 
+            "title": "Policy"
+        }, 
+        {
+            "location": "/getting-started/#alerts", 
+            "text": "An \"Alert\" is an real-time event detected with certain alert policy or correlation logic, with different severity levels like INFO/WARNING/DANGER.", 
+            "title": "Alerts"
+        }, 
+        {
+            "location": "/getting-started/#data-source", 
+            "text": "A \"Data Source\" is a monitoring target data. Eagle supports many data sources HDFS audit logs, Hive2 query, MapReduce job etc.", 
+            "title": "Data Source"
+        }, 
+        {
+            "location": "/getting-started/#stream", 
+            "text": "A \"Stream\" is the streaming data from a data source. Each data source has its own stream.", 
+            "title": "Stream"
+        }, 
+        {
+            "location": "/getting-started/#quick-start", 
+            "text": "", 
+            "title": "Quick Start"
+        }, 
+        {
+            "location": "/getting-started/#deployment", 
+            "text": "", 
+            "title": "Deployment"
+        }, 
+        {
+            "location": "/getting-started/#prerequisites", 
+            "text": "Eagle requires the following dependencies:   For streaming platform dependencies  Storm: 0.9.3 or later  Hadoop: 2.6.x or later  Hbase: 0.98.x or later  Kafka: 0.8.x or later  Zookeeper: 3.4.6 or later  Java: 1.8.x    For metadata database dependencies (Choose one of them)  MangoDB 3.2.2 or later  Installation is required    Mysql 5.1.x or later  Installation is required       Notice:     Storm 0.9.x does NOT support JDK8. You can replace asm-4.0.jar with asm-all-5.0.jar in the storm lib directory. \nThen restart other services(nimbus/ui/supervisor).", 
+            "title": "Prerequisites"
+        }, 
+        {
+            "location": "/getting-started/#installation", 
+            "text": "", 
+            "title": "Installation"
+        }, 
+        {
+            "location": "/getting-started/#build-eagle", 
+            "text": "Download the latest version of Eagle source code.  git clone https://github.com/apache/incubator-eagle.git    Build the source code, and a tar.gz package will be generated under eagle-server-assembly/target  mvn clean install -DskipTests", 
+            "title": "Build Eagle"
+        }, 
+        {
+            "location": "/getting-started/#deploy-eagle", 
+            "text": "Copy binary package to your server machine. In the package, you should find:  bin/ : scripts used for start eagle server  conf/ : default configurations for eagle server setup.  lib/  : all included software packages for eagle server    Change configurations under  conf/  eagle.conf  server.yml     Run eagle-server.sh  ./bin/eagle-server.sh start    Check eagle server   Visit http://host:port/ in your web browser.", 
+            "title": "Deploy Eagle"
+        }, 
+        {
+            "location": "/getting-started/#setup-your-monitoring-case", 
+            "text": "Placeholder for topic: Setup Your Monitoring Case", 
+            "title": "Setup Your Monitoring Case"
+        }, 
+        {
+            "location": "/using-eagle/", 
+            "text": "Manage Eagle and Services\n\n\n\n\n\n\nAfter Apache Eagle has been deployed (please reference \ndeployment\n), you can enter deployment directory and use commands below to control Apache Eagle Server.\n\n\n./bin/eagle-server.sh start|stop|status\n\n\n\n\n\n\n\nAfter starting the Eagle server, please type http://\n:\n/ to open the web ui of Eagle.\n\n\n\n\n\n\n\n\nUse Eagle Web Interface\n\n\n\n\n\n\nThis is the typical Web Interface (short for WI) after setting up your Eagle monitoring environment. WI majorly contain the right main panel and left function menu.\n\n\n\n\n\n\n\n\nHome\n\n\n\n\n\n\nThis is the aggregated UI for configured sites, and the applications. It will show those created sites created, how many application installed for each sites, and alerts generated from that cluster. You can click \u201cMore info\u201d link to view the details for particular site.\n\n\n\n\n\n\nThe \u201c\nWidgets\n\u201d section is customizable; if the application develop
 er have its application registered to Home page, you can find that in \u201c\nWidgets\n\u201d section. Please check the application developer guide about how to register applications to home widgets. It give you a shortcut to go directly to the application home.\n\n\n\n\n\n\nAlert\n\n\n\n\nIn Alert menu, you can define the policies, list the policies and check your alerts there. \n\n\n\n\nIntegration\n\n\n\n\nThe integration page provides the management functionality for Eagle. You can list the built-in applications there, create sites, and manage the applications in your site.\n\n\n\n\nSites\n\n\n\n\nIt also gives you a shortcut to particular site.\n\n\n\n\n\n\nSetup The Monitoring Application\n\n\nMonitoring Applications\n\n\n\n\n\n\nEagle has an extensible framework to dynamically add new monitoring applications in Eagle environment. It also ships some built-in big data monitoring applications.\n\n\n\n\n\n\nGo to \u201c\nIntegration\n\u201d -\n \u201c\nApplications\n\u201d, it wi
 ll list a set of available monitoring applications which you can choose to monitor your services.\n\n\n\n\n\n\n\n\nThe \u201c\nApplication\n\u201d column is the display name for an application, \u201c\nStreams\n\u201d is a logical name for the data stream from the monitored source after pre-processing, which will consumed by Alert Engine.\n\n\n\n\n\n\nAt the moment, we have the below built-in applications shipped with Apache Eagle. You can refer to the application documentation to understand how to do the configuration for each monitoring application.\n\n\n\n\n\n\n\n\nApplication\n\n\nDescription\n\n\n\n\n\n\n\n\n\n\nTopology Health Check\n\n\nThis application can be used to monitor the service healthiness for HDFS, HBase and YARN. You can get alerted once the master role or the slave role got crashed.\n\n\n\n\n\n\nHadoop JMX Metrics Monitoring\n\n\nThis application can be used to monitor the JMX metrics data from the master nodes of HDFS, HBase and YARN, e.g. NameNode, HBase Master
  and YARN Resource Manager.\n\n\n\n\n\n\nHDFS Audit Log Monitor\n\n\nThis application can be used to monitor the data operations in HDFS, to detect sensitive data access and malicious operations; to protect from data leak or data loss.\n\n\n\n\n\n\nHBase Audit Log Monitor\n\n\nSame as HDFS Audit Log Monitor, this application is used to monitor the data operations in HBase.\n\n\n\n\n\n\nMap Reduce History Job\n\n\nThis application is used to get the MapReduce history job counters from YARN history server and job running history from HDFS log directory.\n\n\n\n\n\n\nMap Reduce Running Job\n\n\nThis application is used to get the MapReduce running job counter information using YARN Rest API.\n\n\n\n\n\n\nHadoop Queue Monitor\n\n\nThis application is used to get the resource scheduling and utilization info from YARN.\n\n\n\n\n\n\nMR Metrics Aggregation\n\n\nThis application is used to aggregate the job counters and some resource utilization in a certain period of time (daily, weekly or 
 monthly).\n\n\n\n\n\n\nJob Performance Monitor Web\n\n\nThis application only contains the frontend, and depends on Map Reduce History Job and Map Reduce Running Job.\n\n\n\n\n\n\nAlert Engine\n\n\nAlert Engine is a special application and used to process the output data from other applications.\n\n\n\n\n\n\n\n\n\n\n\n\nManaging Sites\n\n\nTo enable a real monitoring use case, you have to create a site first, and install a certain application for this site, and finally start the application. We use site concept to group the running applications and avoid the application conflict.\n\n\nSites\n\n\n\n\n\n\nGo to \u201c\nIntegration\n\u201d -\n \u201c\nSites\n\u201d, there will be a table listing the managed sites.\n\n\n\n\n\n\n\n\nCreate Site\n\n\n\n\n\n\nClick \u201c\nNew Site\n\u201d on the bottom right of the Sites page. You can fill the information in site creation dialog.\n\n\n\n\n\n\n\n\nThe \u201c\nSite Id\n\u201d should not be duplicated. After the creation, you can find it in 
 sites page.\n\n\n\n\n\n\n\n\nConfiguring a Site\n\n\n\n\n\n\nBy clicking \u201c\nEdit\n\u201d button or the Site column in Sites table, you can have the Site configuration page, there you can install monitoring applications.\n\n\n\n\n\n\n\n\nInstall and Run Applications in Site\n\n\n\n\n\n\nChoose the particular application which you want to install, you probably have something to fill, e.g. the HDFS NameNode address, Zookeeper address and port. Please check each application documentation for how to configure each application. \n\n\n\n\n\n\nAfter doing the installation, you can start the application by clicking \n or stop the application by \n. You can check the \u201c\nStatus\n\u201d column about the running status. Usually, it should have \u201c\nINITIALIZED\n\u201d or \u201c\nRUNNING\n\u201d for a healthy application.\n\n\n\n\n\n\n\n\nDefine Policies\n\n\nAfter setting up the monitoring applications, you probably want to setup some alert policies against the monitored data, so yo
 u can get notified once any violation on the data. Eagle has a centralized place for policy definition.\n\n\nPolicies\n\n\n\n\n\n\nGo to \u201c\nAlert\n\u201d -\n \u201c\nPolicies\n\u201d, you can check the policies defined and take control on whether to enable the policy:\n\n\n\n\n\n\n\n\nYou can apply the below actions for a certain policy:\n\n\n\n\n\n\n: enable a policy\n\n\n\n\n\n\n: disable a policy\n\n\n\n\n\n\n: edit a policy\n\n\n\n\n\n\n: purge a policy\n\n\n\n\n\n\n\n\n\n\nDefine or Edit Policies\n\n\n\n\n\n\nIf you want to create a new policy, click \u201c\nAlert\n\u201d -\n \u201c\nDefine Policy\n\u201d, or you can enter into the policy definition page by editing an existing policy. After that, you can go to the policy list to enable the policy dynamically.\n\n\n\n\n\n\n\n\nSource Stream\n\n\n\n\nThe source stream gives user a full view about what data stream is available for application defined for particular site, as well as the data structures in each data stream. Dat
 a stream name is suffixed by the site name.\n\n\n\n\nPolicy Name\n\n\n\n\nThe policy name should be globally unique.\n\n\n\n\nPublish Alerts\n\n\n\n\n\n\nIn this section, you can define the alert publishment method by clicking the \u201c\n+Add Publisher\n\u201d.\n\n\n\n\n\n\n\n\nYou can choose the publishment method from an existing policy or by creating new publisher. \n\n\n\n\n\n\nThere are four built-in publisher types:\n\n\n\n\n\n\nEmailPublisher\n: org.apache.eagle.alert.engine.publisher.impl.AlertEmailPublisher\n\n\n\n\n\n\nKafkaPublisher\n: org.apache.eagle.alert.engine.publisher.impl.AlertKafkaPublisher\n\n\n\n\n\n\nSlackPublisher\n: org.apache.eagle.alert.engine.publisher.impl.AlertSlackPublisher\n\n\n\n\n\n\nEagleStoragePlugin\n: org.apache.eagle.alert.engine.publisher.impl.AlertEagleStoragePlugin\n\n\n\n\n\n\n\n\n\n\nPolicy Syntax\n\n\n\n\n\n\nCurrently, we support SiddhiQL(please view Siddhi Query Language Specification \nhere\n)\n\n\n\n\n\n\nIn order to explain how stre
 am data is processed, let us take policy below as an example:\n\n\nfrom map_reduce_failed_job_stream[site==\"sandbox\" and currentState==\"FAILED\"]\nselect * group by jobId insert into map_reduce_failed_job_stream_out\n\n\n\n\n\n\n\nThis policy contains below parts:\n\n\n\n\n\n\nSource\n: from map_reduce_failed_job_stream\n\n\n\n\n\n\nFilter\n: [site==\"sandbox\" and currentState==\"FAILED\"]\n\n\n\n\n\n\nProjection\n: select *\n\n\n\n\n\n\nGroupBy\n: group by jobId\n\n\n\n\n\n\nDestination\n: insert into map_reduce_failed_job_stream_out\n\n\n\n\n\n\n\n\n\n\nSource Streams(schema) are defined by applications, and applications will write stream data to data sink(currently, we support kafka as data sink).\n\n\nstreams\n\n    \nstream\n\n        \nstreamId\nmap_reduce_failed_job_stream\n/streamId\n\n        \ndescription\nMap Reduce Failed Job Stream\n/description\n\n        \nvalidate\ntrue\n/validate\n\n        \ncolumns\n\n            \ncolumn\n\n                \nname\nsite\n/name
 \n\n                \ntype\nstring\n/type\n\n            \n/column\n\n            \u2026...\n            \ncolumn\n\n                \nname\njobId\n/name\n\n                \ntype\nstring\n/type\n\n            \ncolumn\n\n                \nname\ncurrentState\n/name\n\n                \ntype\nstring\n/type\n\n            \n/column\n\n        \n/columns\n\n    \n/stream\n\n\n/streams\n\n\n\n\n\n\n\n\nAfter policy is defined, Alert engine will create siddhi execution runtime for the policy(also load stream data schema from metadata store). Since siddhi execution runtime knows the stream data schema, then it will process stream data and do the calculation.\n\n\n\n\n\n\n\n\nMonitoring Dashboard\n\n\n\n\n\n\nAfter setting the sites and applications, you can find the site item from the home page or \u201cSites\u201d menu.\n\n\n\n\n\n\nHere is a site home example. After entering the site home, the left menu will be replaced by application dashboard links only related to that site, so you ca
 n switch between the application dashboard quickly. In the right panel, it contains the application icons installed in this site, but depends on if the application has its dashboard defined. You can click the application icon or the application links to go to the application dashboard home. Please check the application documentation about how to use the application monitoring dashboard.\n\n\n\n\n\n\n\n\n\n\nCheck The Alerts\n\n\n\n\n\n\nEagle has all the alerts generated by all the applications stored in its database, so you can check your application alerts from Eagle WI. \n\n\n\n\n\n\nGo to \u201c\nAlert\n\u201d -\n \u201c\nAlerts\n\u201d, you can find the alerts table.\n\n\n\n\n\n\n\n\nAlso you can check more detailed information by clicking \u201c\nDetail\n\u201d link for each alert item.\n\n\n\n\n\n\n\n\n\n\nHow to stream audit log into Kafka\n\n\nLogstash\n\n\nThe sample configuration is tested with logstash-2.3.4. Logstash is required to be installed on the namenode host.\n\n
 \n\n\n\n\nStep 1\n: Create a Kafka topic as the streaming input.\n\n\nHere is an sample Kafka command to create topic 'sandbox_hdfs_audit_log'\n\n\ncd \nkafka-home\n\nbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic sandbox_hdfs_audit_log\n\n\n\n\n\n\n\nStep 2\n: Create a Logstash configuration file under ${LOGSTASH_HOME}/conf. Here is a sample.\n\n\ninput {\n      file {\n          type =\n \"hdp-nn-audit\"\n          path =\n \"/tmp/test/hdfs-audit.log\"\n          start_position =\n end\n          sincedb_path =\n \"/dev/null\"\n       }\n  }\n output {\n      if [type] == \"hdp-nn-audit\" {\n          kafka {\n            codec =\n plain {\n                format =\n \"%{message}\"\n            }\n            bootstrap_servers =\n \"host:9092\"\n            topic_id =\n \"hdfs_audit_log\"\n            acks =\n \"0\"\n            timeout_ms =\n 10000\n\n            send_buffer_bytes =\n 102400\n            client_id =\n \"hdp-n
 n-audit\"\n\n            workers =\n 10\n            compression_type =\n \"gzip\"\n         }\n          # stdout { codec =\n rubydebug }\n  }\n}\n\n\n\n\n\n\n\nStep 4\n: Start Logstash\n\n\nbin/logstash -f conf/sample.conf\n\n\n\n\n\n\n\nStep 5\n: Check whether logs are flowing into the kafka topic specified by \ntopic_id\n\n\n\n\n\n\nFilebeat\n\n\nThe sample filebeat.yml is tested with filebeat-5.0.0-beta1-linux-x86_64. The throughput can be up to 20K messages per second. Filebeat is required to be installed on the namenode host.\n\n\n    filebeat.publish_async: false\n    filebeat.spool_size: 8192\n    filebeat.idle_timeout: 5s\n    max_procs: 1\n    queue_size: 1000\n\n    filebeat.prospectors:\n    - input_type: log\n      paths:\n         - /tmp/test/hdfs-audit.log\n      #tail_files: true\n      harvester_buffer_size: 8192\n\n    output.kafka:\n      enabled: true\n      hosts: [\"host:9092\"]\n      topic: \"phx_hdfs_audit_log\"\n      client_id: \"client-host\"\n      work
 er: 10\n      max_retries: 3\n      bulk_max_size: 8192\n      channel_buffer_size: 512\n      timeout: 10\n      broker_timeout: 3s\n      keep_alive: 0\n      compression: none\n      max_message_bytes: 1000000\n      required_acks: 0\n      flush_interval: 1\n\n    logging.metrics.period: 10s\n\n    processors:\n      - include_fields:\n         fields: [\"message\", \"beat.hostname\"]\n\n\n\nLog4j Kafka Appender\n\n\nThis sample configuration is tested in HDP sandbox. \nRestarting namenode is required\n after updating the log4j configuration. \n\n\n\n\n\n\nStep 1\n: Create a Kafka topic. Here is an example Kafka command for creating topic \"sandbox_hdfs_audit_log\"\n\n\ncd \nkafka-home\n\nbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic sandbox_hdfs_audit_log\n\n\n\n\n\n\n\nStep 2\n: Configure $HADOOP_CONF_DIR/log4j.properties, and add a log4j appender \"KAFKA_HDFS_AUDIT\" to hdfs audit logging\n\n\nlog4j.appender.KAFKA_HDFS_A
 UDIT=org.apache.eagle.log4j.kafka.KafkaLog4jAppender\nlog4j.appender.KAFKA_HDFS_AUDIT.Topic=sandbox_hdfs_audit_log\nlog4j.appender.KAFKA_HDFS_AUDIT.BrokerList=sandbox.hortonworks.com:6667\nlog4j.appender.KAFKA_HDFS_AUDIT.KeyClass=org.apache.eagle.log4j.kafka.hadoop.AuditLogKeyer\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout=org.apache.log4j.PatternLayout\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n\nlog4j.appender.KAFKA_HDFS_AUDIT.ProducerType=async\n#log4j.appender.KAFKA_HDFS_AUDIT.BatchSize=1\n#log4j.appender.KAFKA_HDFS_AUDIT.QueueSize=1\n\n\n\n\n\n\n\nStep 3\n: Edit $HADOOP_CONF_DIR/hadoop-env.sh, and add the reference to KAFKA_HDFS_AUDIT to HADOOP_NAMENODE_OPTS.\n\n\n-Dhdfs.audit.logger=INFO,DRFAAUDIT,KAFKA_HDFS_AUDIT\n\n\n\n\n\n\n\nStep 4\n: Edit $HADOOP_CONF_DIR/hadoop-env.sh, and append the following command to it.\n\n\nexport HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/path/to/eagle/lib/log4jkafka/lib/*\n\n\n\n\n\n\n\nStep 5\n: save the changes an
 d restart the namenode.\n\n\n\n\n\n\nStep 6\n: Check whether logs are flowing into Topic sandbox_hdfs_audit_log\n\n\n$ /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic sandbox_hdfs_audit_log", 
+            "title": "Using Eagle"
+        }, 
+        {
+            "location": "/using-eagle/#manage-eagle-and-services", 
+            "text": "After Apache Eagle has been deployed (please reference  deployment ), you can enter deployment directory and use commands below to control Apache Eagle Server.  ./bin/eagle-server.sh start|stop|status    After starting the Eagle server, please type http:// : / to open the web ui of Eagle.", 
+            "title": "Manage Eagle and Services"
+        }, 
+        {
+            "location": "/using-eagle/#use-eagle-web-interface", 
+            "text": "This is the typical Web Interface (short for WI) after setting up your Eagle monitoring environment. WI majorly contain the right main panel and left function menu.", 
+            "title": "Use Eagle Web Interface"
+        }, 
+        {
+            "location": "/using-eagle/#home", 
+            "text": "This is the aggregated UI for configured sites, and the applications. It will show those created sites created, how many application installed for each sites, and alerts generated from that cluster. You can click \u201cMore info\u201d link to view the details for particular site.    The \u201c Widgets \u201d section is customizable; if the application developer have its application registered to Home page, you can find that in \u201c Widgets \u201d section. Please check the application developer guide about how to register applications to home widgets. It give you a shortcut to go directly to the application home.", 
+            "title": "Home"
+        }, 
+        {
+            "location": "/using-eagle/#alert", 
+            "text": "In Alert menu, you can define the policies, list the policies and check your alerts there.", 
+            "title": "Alert"
+        }, 
+        {
+            "location": "/using-eagle/#integration", 
+            "text": "The integration page provides the management functionality for Eagle. You can list the built-in applications there, create sites, and manage the applications in your site.", 
+            "title": "Integration"
+        }, 
+        {
+            "location": "/using-eagle/#sites", 
+            "text": "It also gives you a shortcut to particular site.", 
+            "title": "Sites"
+        }, 
+        {
+            "location": "/using-eagle/#setup-the-monitoring-application", 
+            "text": "", 
+            "title": "Setup The Monitoring Application"
+        }, 
+        {
+            "location": "/using-eagle/#monitoring-applications", 
+            "text": "Eagle has an extensible framework to dynamically add new monitoring applications in Eagle environment. It also ships some built-in big data monitoring applications.    Go to \u201c Integration \u201d -  \u201c Applications \u201d, it will list a set of available monitoring applications which you can choose to monitor your services.     The \u201c Application \u201d column is the display name for an application, \u201c Streams \u201d is a logical name for the data stream from the monitored source after pre-processing, which will consumed by Alert Engine.    At the moment, we have the below built-in applications shipped with Apache Eagle. You can refer to the application documentation to understand how to do the configuration for each monitoring application.     Application  Description      Topology Health Check  This application can be used to monitor the service healthiness for HDFS, HBase and YARN. You can get alerted once the master role or the slave role got
  crashed.    Hadoop JMX Metrics Monitoring  This application can be used to monitor the JMX metrics data from the master nodes of HDFS, HBase and YARN, e.g. NameNode, HBase Master and YARN Resource Manager.    HDFS Audit Log Monitor  This application can be used to monitor the data operations in HDFS, to detect sensitive data access and malicious operations; to protect from data leak or data loss.    HBase Audit Log Monitor  Same as HDFS Audit Log Monitor, this application is used to monitor the data operations in HBase.    Map Reduce History Job  This application is used to get the MapReduce history job counters from YARN history server and job running history from HDFS log directory.    Map Reduce Running Job  This application is used to get the MapReduce running job counter information using YARN Rest API.    Hadoop Queue Monitor  This application is used to get the resource scheduling and utilization info from YARN.    MR Metrics Aggregation  This application is used to aggregat
 e the job counters and some resource utilization in a certain period of time (daily, weekly or monthly).    Job Performance Monitor Web  This application only contains the frontend, and depends on Map Reduce History Job and Map Reduce Running Job.    Alert Engine  Alert Engine is a special application and used to process the output data from other applications.", 
+            "title": "Monitoring Applications"
+        }, 
+        {
+            "location": "/using-eagle/#managing-sites", 
+            "text": "To enable a real monitoring use case, you have to create a site first, and install a certain application for this site, and finally start the application. We use site concept to group the running applications and avoid the application conflict.", 
+            "title": "Managing Sites"
+        }, 
+        {
+            "location": "/using-eagle/#sites_1", 
+            "text": "Go to \u201c Integration \u201d -  \u201c Sites \u201d, there will be a table listing the managed sites.", 
+            "title": "Sites"
+        }, 
+        {
+            "location": "/using-eagle/#create-site", 
+            "text": "Click \u201c New Site \u201d on the bottom right of the Sites page. You can fill the information in site creation dialog.     The \u201c Site Id \u201d should not be duplicated. After the creation, you can find it in sites page.", 
+            "title": "Create Site"
+        }, 
+        {
+            "location": "/using-eagle/#configuring-a-site", 
+            "text": "By clicking \u201c Edit \u201d button or the Site column in Sites table, you can have the Site configuration page, there you can install monitoring applications.", 
+            "title": "Configuring a Site"
+        }, 
+        {
+            "location": "/using-eagle/#install-and-run-applications-in-site", 
+            "text": "Choose the particular application which you want to install, you probably have something to fill, e.g. the HDFS NameNode address, Zookeeper address and port. Please check each application documentation for how to configure each application.     After doing the installation, you can start the application by clicking   or stop the application by  . You can check the \u201c Status \u201d column about the running status. Usually, it should have \u201c INITIALIZED \u201d or \u201c RUNNING \u201d for a healthy application.", 
+            "title": "Install and Run Applications in Site"
+        }, 
+        {
+            "location": "/using-eagle/#define-policies", 
+            "text": "After setting up the monitoring applications, you probably want to setup some alert policies against the monitored data, so you can get notified once any violation on the data. Eagle has a centralized place for policy definition.", 
+            "title": "Define Policies"
+        }, 
+        {
+            "location": "/using-eagle/#policies", 
+            "text": "Go to \u201c Alert \u201d -  \u201c Policies \u201d, you can check the policies defined and take control on whether to enable the policy:     You can apply the below actions for a certain policy:    : enable a policy    : disable a policy    : edit a policy    : purge a policy", 
+            "title": "Policies"
+        }, 
+        {
+            "location": "/using-eagle/#define-or-edit-policies", 
+            "text": "If you want to create a new policy, click \u201c Alert \u201d -  \u201c Define Policy \u201d, or you can enter into the policy definition page by editing an existing policy. After that, you can go to the policy list to enable the policy dynamically.", 
+            "title": "Define or Edit Policies"
+        }, 
+        {
+            "location": "/using-eagle/#source-stream", 
+            "text": "The source stream gives user a full view about what data stream is available for application defined for particular site, as well as the data structures in each data stream. Data stream name is suffixed by the site name.", 
+            "title": "Source Stream"
+        }, 
+        {
+            "location": "/using-eagle/#policy-name", 
+            "text": "The policy name should be globally unique.", 
+            "title": "Policy Name"
+        }, 
+        {
+            "location": "/using-eagle/#publish-alerts", 
+            "text": "In this section, you can define the alert publishment method by clicking the \u201c +Add Publisher \u201d.     You can choose the publishment method from an existing policy or by creating new publisher.     There are four built-in publisher types:    EmailPublisher : org.apache.eagle.alert.engine.publisher.impl.AlertEmailPublisher    KafkaPublisher : org.apache.eagle.alert.engine.publisher.impl.AlertKafkaPublisher    SlackPublisher : org.apache.eagle.alert.engine.publisher.impl.AlertSlackPublisher    EagleStoragePlugin : org.apache.eagle.alert.engine.publisher.impl.AlertEagleStoragePlugin", 
+            "title": "Publish Alerts"
+        }, 
+        {
+            "location": "/using-eagle/#policy-syntax", 
+            "text": "Currently, we support SiddhiQL(please view Siddhi Query Language Specification  here )    In order to explain how stream data is processed, let us take policy below as an example:  from map_reduce_failed_job_stream[site==\"sandbox\" and currentState==\"FAILED\"]\nselect * group by jobId insert into map_reduce_failed_job_stream_out    This policy contains below parts:    Source : from map_reduce_failed_job_stream    Filter : [site==\"sandbox\" and currentState==\"FAILED\"]    Projection : select *    GroupBy : group by jobId    Destination : insert into map_reduce_failed_job_stream_out      Source Streams(schema) are defined by applications, and applications will write stream data to data sink(currently, we support kafka as data sink).  streams \n     stream \n         streamId map_reduce_failed_job_stream /streamId \n         description Map Reduce Failed Job Stream /description \n         validate true /validate \n         columns \n             column \n      
            name site /name \n                 type string /type \n             /column \n            \u2026...\n             column \n                 name jobId /name \n                 type string /type \n             column \n                 name currentState /name \n                 type string /type \n             /column \n         /columns \n     /stream  /streams     After policy is defined, Alert engine will create siddhi execution runtime for the policy(also load stream data schema from metadata store). Since siddhi execution runtime knows the stream data schema, then it will process stream data and do the calculation.", 
+            "title": "Policy Syntax"
+        }, 
+        {
+            "location": "/using-eagle/#monitoring-dashboard", 
+            "text": "After setting the sites and applications, you can find the site item from the home page or \u201cSites\u201d menu.    Here is a site home example. After entering the site home, the left menu will be replaced by application dashboard links only related to that site, so you can switch between the application dashboard quickly. In the right panel, it contains the application icons installed in this site, but depends on if the application has its dashboard defined. You can click the application icon or the application links to go to the application dashboard home. Please check the application documentation about how to use the application monitoring dashboard.", 
+            "title": "Monitoring Dashboard"
+        }, 
+        {
+            "location": "/using-eagle/#check-the-alerts", 
+            "text": "Eagle has all the alerts generated by all the applications stored in its database, so you can check your application alerts from Eagle WI.     Go to \u201c Alert \u201d -  \u201c Alerts \u201d, you can find the alerts table.     Also you can check more detailed information by clicking \u201c Detail \u201d link for each alert item.", 
+            "title": "Check The Alerts"
+        }, 
+        {
+            "location": "/using-eagle/#how-to-stream-audit-log-into-kafka", 
+            "text": "", 
+            "title": "How to stream audit log into Kafka"
+        }, 
+        {
+            "location": "/using-eagle/#logstash", 
+            "text": "The sample configuration is tested with logstash-2.3.4. Logstash is required to be installed on the namenode host.    Step 1 : Create a Kafka topic as the streaming input.  Here is an sample Kafka command to create topic 'sandbox_hdfs_audit_log'  cd  kafka-home \nbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic sandbox_hdfs_audit_log    Step 2 : Create a Logstash configuration file under ${LOGSTASH_HOME}/conf. Here is a sample.  input {\n      file {\n          type =  \"hdp-nn-audit\"\n          path =  \"/tmp/test/hdfs-audit.log\"\n          start_position =  end\n          sincedb_path =  \"/dev/null\"\n       }\n  }\n output {\n      if [type] == \"hdp-nn-audit\" {\n          kafka {\n            codec =  plain {\n                format =  \"%{message}\"\n            }\n            bootstrap_servers =  \"host:9092\"\n            topic_id =  \"hdfs_audit_log\"\n            acks =  \"0\"\n            timeout
 _ms =  10000\n\n            send_buffer_bytes =  102400\n            client_id =  \"hdp-nn-audit\"\n\n            workers =  10\n            compression_type =  \"gzip\"\n         }\n          # stdout { codec =  rubydebug }\n  }\n}    Step 4 : Start Logstash  bin/logstash -f conf/sample.conf    Step 5 : Check whether logs are flowing into the kafka topic specified by  topic_id", 
+            "title": "Logstash"
+        }, 
+        {
+            "location": "/using-eagle/#filebeat", 
+            "text": "The sample filebeat.yml is tested with filebeat-5.0.0-beta1-linux-x86_64. The throughput can be up to 20K messages per second. Filebeat is required to be installed on the namenode host.      filebeat.publish_async: false\n    filebeat.spool_size: 8192\n    filebeat.idle_timeout: 5s\n    max_procs: 1\n    queue_size: 1000\n\n    filebeat.prospectors:\n    - input_type: log\n      paths:\n         - /tmp/test/hdfs-audit.log\n      #tail_files: true\n      harvester_buffer_size: 8192\n\n    output.kafka:\n      enabled: true\n      hosts: [\"host:9092\"]\n      topic: \"phx_hdfs_audit_log\"\n      client_id: \"client-host\"\n      worker: 10\n      max_retries: 3\n      bulk_max_size: 8192\n      channel_buffer_size: 512\n      timeout: 10\n      broker_timeout: 3s\n      keep_alive: 0\n      compression: none\n      max_message_bytes: 1000000\n      required_acks: 0\n      flush_interval: 1\n\n    logging.metrics.period: 10s\n\n    processors:\n      - include_fie
 lds:\n         fields: [\"message\", \"beat.hostname\"]", 
+            "title": "Filebeat"
+        }, 
+        {
+            "location": "/using-eagle/#log4j-kafka-appender", 
+            "text": "This sample configuration is tested in HDP sandbox.  Restarting namenode is required  after updating the log4j configuration.     Step 1 : Create a Kafka topic. Here is an example Kafka command for creating topic \"sandbox_hdfs_audit_log\"  cd  kafka-home \nbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic sandbox_hdfs_audit_log    Step 2 : Configure $HADOOP_CONF_DIR/log4j.properties, and add a log4j appender \"KAFKA_HDFS_AUDIT\" to hdfs audit logging  log4j.appender.KAFKA_HDFS_AUDIT=org.apache.eagle.log4j.kafka.KafkaLog4jAppender\nlog4j.appender.KAFKA_HDFS_AUDIT.Topic=sandbox_hdfs_audit_log\nlog4j.appender.KAFKA_HDFS_AUDIT.BrokerList=sandbox.hortonworks.com:6667\nlog4j.appender.KAFKA_HDFS_AUDIT.KeyClass=org.apache.eagle.log4j.kafka.hadoop.AuditLogKeyer\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout=org.apache.log4j.PatternLayout\nlog4j.appender.KAFKA_HDFS_AUDIT.Layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n\nlog
 4j.appender.KAFKA_HDFS_AUDIT.ProducerType=async\n#log4j.appender.KAFKA_HDFS_AUDIT.BatchSize=1\n#log4j.appender.KAFKA_HDFS_AUDIT.QueueSize=1    Step 3 : Edit $HADOOP_CONF_DIR/hadoop-env.sh, and add the reference to KAFKA_HDFS_AUDIT to HADOOP_NAMENODE_OPTS.  -Dhdfs.audit.logger=INFO,DRFAAUDIT,KAFKA_HDFS_AUDIT    Step 4 : Edit $HADOOP_CONF_DIR/hadoop-env.sh, and append the following command to it.  export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/path/to/eagle/lib/log4jkafka/lib/*    Step 5 : save the changes and restart the namenode.    Step 6 : Check whether logs are flowing into Topic sandbox_hdfs_audit_log  $ /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic sandbox_hdfs_audit_log", 
+            "title": "Log4j Kafka Appender"
+        }, 
+        {
+            "location": "/applications/", 
+            "text": "HDFS Data Activity Monitoring\n\n\nMonitor Requirements\n\n\nThis application aims to monitor user activities on HDFS via the hdfs audit log. Once any abnormal user activity is detected, an alert is sent in several seconds. The whole pipeline of this application is\n\n\n\n\n\n\nKafka ingest: this application consumes data from Kafka. In other words, users have to stream the log into Kafka first. \n\n\n\n\n\n\nData re-procesing, which includes raw log parser, ip zone joiner, sensitivity information joiner. \n\n\n\n\n\n\nKafka sink: parsed data will flows into Kafka again, which will be consumed by the alert engine. \n\n\n\n\n\n\nPolicy evaluation: the alert engine (hosted in Alert Engine app) evaluates each data event to check if the data violate the user defined policy. An alert is generated if the data matches the policy.\n\n\n\n\n\n\n\n\nSetup \n Installation\n\n\n\n\n\n\nChoose a site to install this application. For example 'sandbox'\n\n\n\n\n\n\nInstall \"H
 dfs Audit Log Monitor\" app step by step\n\n\n\n\n\n\n\n\n\n\n\n\nHow to collect the log\n\n\nTo collect the raw audit log on namenode servers, a log collector is needed. Users can choose any tools they like. There are some common solutions available: \nlogstash\n, \nfilebeat\n, log4j appender, etcs. \n\n\nFor detailed instruction, refer to: \nHow to stream audit log into Kafka\n\n\nSample policies\n\n\n1. monitor file/folder operations\n\n\nDelete a file/folder on HDFS. \n\n\nfrom HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[str:contains(src,'/tmp/test/subtest') and ((cmd=='rename' and str:contains(dst, '.Trash')) or cmd=='delete')] select * group by user insert into hdfs_audit_log_enriched_stream_out\n\n\n\n\nHDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX is the input stream name, and hdfs_audit_log_enriched_stream_out is the output stream name, the content between [] is the monitoring conditions. \ncmd\n, \nsrc\n and \ndst\n is the fields of hdfs audit logs.\n\n\n\n\n2. classify the file/folde
 r on HDFS\n\n\nUsers may want to mark some folders/files on HDFS as sensitive content. For example, by marking '/sys/soj' as \"SOJ\", users can monitor any operations they care about on 'sys/soj' and its subfolders/files.\n\n\nfrom HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[sensitivityType=='SOJ' and cmd=='delete')] select * group by user insert into hdfs_audit_log_enriched_stream_out\n\n\n\n\nThe example policy monitors the 'delete' operation on files/subfolders under /sys/soj. \n\n\n3. Classify the IP Zone\n\n\nIn some cases, the ips are classified into different zones. For some zone, it may have higher secrecy. Eagle providers ways to monitor user activities on IP level. \n\n\nfrom HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[securityZone=='SECURITY' and cmd=='delete')] select * group by user insert into hdfs_audit_log_enriched_stream_out\n\n\n\n\nThe example policy monitors the 'delete' operation on hosts in 'SECURITY' zone. \n\n\nQuestions on this application\n\n\n\n\nJMX Monitoring\n\n\
 n\n\n\n\nApplication \"\nHADOOP_JMX_METRIC_MONITOR\n\" provide embedded collector script to ingest hadoop/hbase jmx metric as eagle stream and provide ability to define alert policy and detect anomaly in real-time from metric.\n\n\n\n\n\n\n\n\nFields\n\n\n\n\n\n\n\n\n\n\n\n\nType\n\n\nHADOOP_JMX_METRIC_MONITOR\n\n\n\n\n\n\nVersion\n\n\n0.5.0-version\n\n\n\n\n\n\nDescription\n\n\nCollect JMX Metric and monitor in real-time\n\n\n\n\n\n\nStreams\n\n\nHADOOP_JMX_METRIC_STREAM\n\n\n\n\n\n\nConfiguration\n\n\nJMX Metric Kafka Topic (default: hadoop_jmx_metric_{SITE_ID})\nKafka Broker List (default: localhost:6667)\n\n\n\n\n\n\n\n\n\n\n\n\nSetup \n Installation\n\n\n\n\n\n\nMake sure already setup a site (here use a demo site named \"sandbox\").\n\n\n\n\n\n\nInstall \"Hadoop JMX Monitor\" app in eagle server.\n\n\n\n\n\n\n\n\nConfigure Application settings.\n\n\n\n\n\n\n\n\nEnsure a kafka topic named hadoop_jmx_metric_{SITE_ID} (In current guide, it should be hadoop_jmx_metric_sandbox)\n\n
 \n\n\n\n\nSetup metric collector for monitored Hadoop/HBase using hadoop_jmx_collector and modify the configuration.\n\n\n\n\n\n\nCollector scripts: \nhadoop_jmx_collector\n\n\n\n\n\n\nRename config-sample.json to config.json: \nconfig-sample.json\n\n\n{\n    env: {\n        site: \"sandbox\",\n        name_node: {\n            hosts: [\n                \"sandbox.hortonworks.com\"\n            ],\n            port: 50070,\n            https: false\n        },\n        resource_manager: {\n            hosts: [\n                \"sandbox.hortonworks.com\"\n            ],\n            port: 50030,\n            https: false\n        }\n    },\n    inputs: [{\n        component: \"namenode\",\n        host: \"server.eagle.apache.org\",\n        port: \"50070\",\n        https: false,\n        kafka_topic: \"nn_jmx_metric_sandbox\"\n    }, {\n        component: \"resourcemanager\",\n        host: \"server.eagle.apache.org\",\n        port: \"8088\",\n        https: false,\n        kafka_t
 opic: \"rm_jmx_metric_sandbox\"\n    }, {\n        component: \"datanode\",\n        host: \"server.eagle.apache.org\",\n        port: \"50075\",\n        https: false,\n        kafka_topic: \"dn_jmx_metric_sandbox\"\n    }],\n    filter: {\n        monitoring.group.selected: [\n            \"hadoop\",\n            \"java.lang\"\n        ]\n    },\n    output: {\n        kafka: {\n            brokerList: [\n                \"localhost:9092\"\n            ]\n        }\n    }\n}\n\n\n\n\n\n\n\n\n\n\n\nClick \"Install\" button then you will see the HADOOP_JMX_METRIC_STREAM_{SITE_ID} in Streams.\n\n\n\n\n\n\n\n\nDefine JMX Alert Policy\n\n\n\n\n\n\nGo to \"Define Policy\".\n\n\n\n\n\n\nSelect HADOOP_JMX_METRIC_MONITOR related streams.\n\n\n\n\n\n\nDefine SQL-Like policy, for example\n\n\nfrom HADOOP_JMX_METRIC_STREAM_SANDBOX[metric==\"cpu.usage\" and value \n 0.9]\nselect site,host,component,value\ninsert into HADOOP_CPU_USAGE_GT_90_ALERT;\n\n\n\nAs seen in below screenshot:\n\n\n\n\n\n
 \n\n\nStream Schema\n\n\n\n\n\n\nSchema\n\n\n\n\n\n\n\n\nStream Name\n\n\nStream Schema\n\n\nTime Series\n\n\n\n\n\n\n\n\n\n\nHADOOP_JMX_METRIC_MONITOR\n\n\nhost\n: STRING\ntimestamp\n: LONG\nmetric\n: STRING\ncomponent\n: STRING\nsite\n: STRING\nvalue\n: DOUBLE\n\n\nTrue\n\n\n\n\n\n\n\n\n\n\n\n\nMetrics List\n\n\n\n\nPlease refer to the \nHadoop JMX Metrics List\n and see which metrics you're interested in.\n\n\n\n\n\n\nJob Performance Monitoring\n\n\nMonitor Requirements\n\n\n\n\nFinished/Running Job Details\n\n\nJob Metrics(Job Counter/Statistics) Aggregation\n\n\nAlerts(Job failure/Job slow)\n\n\n\n\nApplications\n\n\n\n\n\n\nApplication Table\n\n\n\n\n\n\n\n\napplication\n\n\nresponsibility\n\n\n\n\n\n\n\n\n\n\nMap Reduce History Job Monitoring\n\n\nparse mr history job logs from hdfs\n\n\n\n\n\n\nMap Reduce Running Job Monitoring\n\n\nget mr running job details from resource manager\n\n\n\n\n\n\nMap Reduce Metrics Aggregation\n\n\naggregate metrics generated by applications ab
 ove\n\n\n\n\n\n\n\n\n\n\n\n\nData Ingestion And Process\n\n\n\n\n\n\nWe build storm topology to fulfill requirements for each application.\n\n\n\n\n\n\n\n\nMap Reduce History Job Monitoring (Figure 1)\n\n\n\n\nRead Spout\n\n\nread/parse history job logs from HDFS and flush to eagle service(storage is Hbase)\n\n\n\n\n\n\nSink Bolt\n\n\nconvert parsed jobs to streams and write to data sink\n\n\n\n\n\n\n\n\n\n\nMap Reduce Running Job Monitoring (Figure 2)\n\n\nRead Spout\n\n\nfetch running job list from resource manager and emit to Parse Bolt\n\n\n\n\n\n\nParse Bolt\n\n\nfor each running job, fetch job detail/job counter/job configure/tasks from resource manager\n\n\n\n\n\n\n\n\n\n\nMap Reduce Metrics Aggregation (Figure 3)\n\n\nDivide Spout\n\n\ndivide time period(need to be aggregated) to small pieces and emit to Aggregate Bolt\n\n\n\n\n\n\nAggregate Bolt\n\n\naggregate metrics for given time period received from Divide Spout\n\n\n\n\n\n\n\n\n\n\n\n\nSetup \n Installation\n\n\n\n\n\n
 \nMake sure already setup a site (here use a demo site named \"sandbox\").\n\n\n\n\n\n\nInstall \"Map Reduce History Job\" app in eagle server(Take this application as an example).\n\n\n\n\n\n\nConfigure Application settings\n\n\n\n\n\n\n\n\nEnsure a kafka topic named {SITE_ID}_map_reduce_failed_job (In current guide, it should be sandbox_map_reduce_failed_job) will be created.\n\n\n\n\n\n\nClick \"Install\" button then you will see the MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID} in Alert-\nStreams.\n    \n\n  This application will write stream data to kafka topic(created by last step)\n\n\n\n\n\n\nIntegration With Alert Engine\n\n\nIn order to integrate applications with alert engine and send alerts, follow below steps(Take Map Reduce History Job application as an example):\n\n\n\n\n\n\ndefine stream and configure data sink\n\n\n\n\ndefine stream in resource/META-INF/providers/xxxProviders.xml\nFor example, MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID}\n\n\nconfigure data sink\nFor example, cr
 eate kafka topic {SITE_ID}_map_reduce_failed_job\n\n\n\n\n\n\n\n\ndefine policy\n\n\n\n\n\n\nFor example, if you want to receive map reduce job failure alerts, you can define policies (SiddhiQL) as the following:\n\n\nfrom map_reduce_failed_job_stream[site==\nsandbox\n and currentState==\nFAILED\n]\nselect site, queue, user, jobType, jobId, submissionTime, trackingUrl, startTime, endTime\ngroup by jobId insert into map_reduce_failed_job_stream_out\n\n\n\n\n\n\n\n\nview alerts\n\n\n\n\nYou can view alerts in Alert-\nalerts page.\n\n\nStream Schema\n\n\nAll columns above are predefined in stream map_reduce_failed_job_stream defined in\n\n\neagle-jpm/eagle-jpm-mr-history/src/main/resources/META-INF/providers/org.apache.eagle.jpm.mr.history.MRHistoryJobApplicationProvider.xml\n\n\n\nThen, enable the policy in web ui after it's created. Eagle will schedule it automatically.\n\n\n\n\nTopology Health Check\n\n\n\n\n\n\nApplication \"TOPOLOGY HEALTH CHECK\" aims to monior those servies with
  a master-slave structured topology and provide metrics at host level.\n\n\n\n\n\n\n\n\nFields\n\n\n\n\n\n\n\n\n\n\n\n\nType\n\n\nTOPOLOGY_HEALTH_CHECK\n\n\n\n\n\n\nVersion\n\n\n0.5.0-version\n\n\n\n\n\n\nDescription\n\n\nCollect MR,HBASE,HDFS node status and cluster ratio\n\n\n\n\n\n\nStreams\n\n\nTOPOLOGY_HEALTH_CHECK_STREAM\n\n\n\n\n\n\nConfiguration\n\n\nTopology Health Check Topic (default: topology_health_check)\nKafka Broker List (default: sandobox.hortonworks.com:6667)\n\n\n\n\n\n\n\n\n\n\n\n\nSetup \n Installation\n\n\n\n\n\n\nMake sure already setup a site (here use a demo site named \"sandbox\").\n\n\n\n\n\n\nInstall \"Topology Health Check\" app in eagle server.\n\n\n\n\n\n\n\n\nConfigure Application settings.\n\n\n\n\n\n\n\n\nEnsure the existence of a kafka topic named topology_health_check (In current guide, it should be topology_health_check).\n\n\n\n\n\n\nClick \"Install\" button then you will see the TOPOLOGY_HEALTH_CHECK_STREAM_{SITE_ID} on \"Streams\" page (Stream
 s could be navigated in left-nav).\n\n\n\n\n\n\n\n\nDefine Health Check Alert Policy\n\n\n\n\n\n\nGo to \"Define Policy\".\n\n\n\n\n\n\nSelect TOPOLOGY_HEALTH_CHECK related streams.\n\n\n\n\n\n\nDefine SQL-Like policy, for example\n\n\nfrom TOPOLOGY_HEALTH_CHECK_STREAM_SANDBOX[status=='dead'] select * insert into topology_health_check_stream_out;\n\n\n\n\n\n\n\n\n\n\n\nHadoop Queue Monitoring\n\n\n\n\n\n\nThis application collects metrics of Resource Manager in the following aspects:\n\n\n\n\n\n\nScheduler Info of the cluster: http://{RM_HTTP_ADDRESS}:{PORT}/ws/v1/cluster/scheduler\n\n\n\n\n\n\nApplications of the cluster: http://{RM_HTTP_ADDRESS}:{PORT}/ws/v1/cluster/apps\n\n\n\n\n\n\nOverall metrics of the cluster: http://{RM_HTTP_ADDRESS}:{PORT}/ws/v1/cluster/metrics\n\n\nby version 0.5-incubating, mainly focusing at metrics\n - `appsPending`\n - `allocatedMB`\n - `totalMB`\n - `availableMB`\n - `reservedMB`\n - `allocatedVirtualCores`.\n\n\n\n\n\n\n\n\n\n\n\nSetup \n Installatio
 n\n\n\n\n\n\n\nMake sure already setup a site (here use a demo site named \"sandbox\").\n\n\n\n\n\n\nFrom left-nav list, navigate to application managing page by \"\nIntegration\n\" \n \"\nSites\n\", and hit link \"\nsandbox\n\" on right.\n\n\n\n\n\n\n\n\nInstall \"Hadoop Queue Monitor\" by clicking \"install\" button of the application.\n\n\n\n\n\n\n\n\nIn the pop-up layout, select running mode as \nLocal\n or \nCluster\n.\n\n\n\n\n\n\n\n\nSet the target jar of eagle's topology assembly that has existed in eagle server, indicating the absolute path ot it. As in the following screenshot:\n\n\n\n\n\n\n\n\nSet Resource Manager endpoint urls field, separate values with comma if there are more than 1 url (e.g. a secondary node for HA).\n\n\n\n\n\n\n\n\nSet fields \"\nStorm Worker Number\n\", \"\nParallel Tasks Per Bolt\n\", and \"\nFetching Metric Interval in Seconds\n\", or leave them as default if they fit your needs.\n\n\n\n\n\n\n\n\nFinally, hit \"\nInstall\n\" button to complete it
 .\n\n\n\n\n\n\nUse of the application\n\n\n\n\n\n\nThere is no need to define policies for this applicatoin to work, it could be integrated with \"\nJob Performance Monitoring Web\n\" application and consequently seen on cluster dashboard, as long as the latter application is installed too. See an exmple in the following screenshot:", 
+            "title": "Applications"
+        }, 
+        {
+            "location": "/applications/#hdfs-data-activity-monitoring", 
+            "text": "", 
+            "title": "HDFS Data Activity Monitoring"
+        }, 
+        {
+            "location": "/applications/#monitor-requirements", 
+            "text": "This application aims to monitor user activities on HDFS via the hdfs audit log. Once any abnormal user activity is detected, an alert is sent in several seconds. The whole pipeline of this application is    Kafka ingest: this application consumes data from Kafka. In other words, users have to stream the log into Kafka first.     Data re-procesing, which includes raw log parser, ip zone joiner, sensitivity information joiner.     Kafka sink: parsed data will flows into Kafka again, which will be consumed by the alert engine.     Policy evaluation: the alert engine (hosted in Alert Engine app) evaluates each data event to check if the data violate the user defined policy. An alert is generated if the data matches the policy.", 
+            "title": "Monitor Requirements"
+        }, 
+        {
+            "location": "/applications/#setup-installation", 
+            "text": "Choose a site to install this application. For example 'sandbox'    Install \"Hdfs Audit Log Monitor\" app step by step", 
+            "title": "Setup &amp; Installation"
+        }, 
+        {
+            "location": "/applications/#how-to-collect-the-log", 
+            "text": "To collect the raw audit log on namenode servers, a log collector is needed. Users can choose any tools they like. There are some common solutions available:  logstash ,  filebeat , log4j appender, etcs.   For detailed instruction, refer to:  How to stream audit log into Kafka", 
+            "title": "How to collect the log"
+        }, 
+        {
+            "location": "/applications/#sample-policies", 
+            "text": "", 
+            "title": "Sample policies"
+        }, 
+        {
+            "location": "/applications/#1-monitor-filefolder-operations", 
+            "text": "Delete a file/folder on HDFS.   from HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[str:contains(src,'/tmp/test/subtest') and ((cmd=='rename' and str:contains(dst, '.Trash')) or cmd=='delete')] select * group by user insert into hdfs_audit_log_enriched_stream_out  HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX is the input stream name, and hdfs_audit_log_enriched_stream_out is the output stream name, the content between [] is the monitoring conditions.  cmd ,  src  and  dst  is the fields of hdfs audit logs.", 
+            "title": "1. monitor file/folder operations"
+        }, 
+        {
+            "location": "/applications/#2-classify-the-filefolder-on-hdfs", 
+            "text": "Users may want to mark some folders/files on HDFS as sensitive content. For example, by marking '/sys/soj' as \"SOJ\", users can monitor any operations they care about on 'sys/soj' and its subfolders/files.  from HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[sensitivityType=='SOJ' and cmd=='delete')] select * group by user insert into hdfs_audit_log_enriched_stream_out  The example policy monitors the 'delete' operation on files/subfolders under /sys/soj.", 
+            "title": "2. classify the file/folder on HDFS"
+        }, 
+        {
+            "location": "/applications/#3-classify-the-ip-zone", 
+            "text": "In some cases, the ips are classified into different zones. For some zone, it may have higher secrecy. Eagle providers ways to monitor user activities on IP level.   from HDFS_AUDIT_LOG_ENRICHED_STREAM_SANDBOX[securityZone=='SECURITY' and cmd=='delete')] select * group by user insert into hdfs_audit_log_enriched_stream_out  The example policy monitors the 'delete' operation on hosts in 'SECURITY' zone.", 
+            "title": "3. Classify the IP Zone"
+        }, 
+        {
+            "location": "/applications/#questions-on-this-application", 
+            "text": "", 
+            "title": "Questions on this application"
+        }, 
+        {
+            "location": "/applications/#jmx-monitoring", 
+            "text": "Application \" HADOOP_JMX_METRIC_MONITOR \" provide embedded collector script to ingest hadoop/hbase jmx metric as eagle stream and provide ability to define alert policy and detect anomaly in real-time from metric.     Fields       Type  HADOOP_JMX_METRIC_MONITOR    Version  0.5.0-version    Description  Collect JMX Metric and monitor in real-time    Streams  HADOOP_JMX_METRIC_STREAM    Configuration  JMX Metric Kafka Topic (default: hadoop_jmx_metric_{SITE_ID}) Kafka Broker List (default: localhost:6667)", 
+            "title": "JMX Monitoring"
+        }, 
+        {
+            "location": "/applications/#setup-installation_1", 
+            "text": "Make sure already setup a site (here use a demo site named \"sandbox\").    Install \"Hadoop JMX Monitor\" app in eagle server.     Configure Application settings.     Ensure a kafka topic named hadoop_jmx_metric_{SITE_ID} (In current guide, it should be hadoop_jmx_metric_sandbox)    Setup metric collector for monitored Hadoop/HBase using hadoop_jmx_collector and modify the configuration.    Collector scripts:  hadoop_jmx_collector    Rename config-sample.json to config.json:  config-sample.json  {\n    env: {\n        site: \"sandbox\",\n        name_node: {\n            hosts: [\n                \"sandbox.hortonworks.com\"\n            ],\n            port: 50070,\n            https: false\n        },\n        resource_manager: {\n            hosts: [\n                \"sandbox.hortonworks.com\"\n            ],\n            port: 50030,\n            https: false\n        }\n    },\n    inputs: [{\n        component: \"namenode\",\n        host: \"server.eagle.
 apache.org\",\n        port: \"50070\",\n        https: false,\n        kafka_topic: \"nn_jmx_metric_sandbox\"\n    }, {\n        component: \"resourcemanager\",\n        host: \"server.eagle.apache.org\",\n        port: \"8088\",\n        https: false,\n        kafka_topic: \"rm_jmx_metric_sandbox\"\n    }, {\n        component: \"datanode\",\n        host: \"server.eagle.apache.org\",\n        port: \"50075\",\n        https: false,\n        kafka_topic: \"dn_jmx_metric_sandbox\"\n    }],\n    filter: {\n        monitoring.group.selected: [\n            \"hadoop\",\n            \"java.lang\"\n        ]\n    },\n    output: {\n        kafka: {\n            brokerList: [\n                \"localhost:9092\"\n            ]\n        }\n    }\n}      Click \"Install\" button then you will see the HADOOP_JMX_METRIC_STREAM_{SITE_ID} in Streams.", 
+            "title": "Setup &amp; Installation"
+        }, 
+        {
+            "location": "/applications/#define-jmx-alert-policy", 
+            "text": "Go to \"Define Policy\".    Select HADOOP_JMX_METRIC_MONITOR related streams.    Define SQL-Like policy, for example  from HADOOP_JMX_METRIC_STREAM_SANDBOX[metric==\"cpu.usage\" and value   0.9]\nselect site,host,component,value\ninsert into HADOOP_CPU_USAGE_GT_90_ALERT;  As seen in below screenshot:", 
+            "title": "Define JMX Alert Policy"
+        }, 
+        {
+            "location": "/applications/#stream-schema", 
+            "text": "Schema     Stream Name  Stream Schema  Time Series      HADOOP_JMX_METRIC_MONITOR  host : STRING timestamp : LONG metric : STRING component : STRING site : STRING value : DOUBLE  True", 
+            "title": "Stream Schema"
+        }, 
+        {
+            "location": "/applications/#metrics-list", 
+            "text": "Please refer to the  Hadoop JMX Metrics List  and see which metrics you're interested in.", 
+            "title": "Metrics List"
+        }, 
+        {
+            "location": "/applications/#job-performance-monitoring", 
+            "text": "", 
+            "title": "Job Performance Monitoring"
+        }, 
+        {
+            "location": "/applications/#monitor-requirements_1", 
+            "text": "Finished/Running Job Details  Job Metrics(Job Counter/Statistics) Aggregation  Alerts(Job failure/Job slow)", 
+            "title": "Monitor Requirements"
+        }, 
+        {
+            "location": "/applications/#applications", 
+            "text": "Application Table     application  responsibility      Map Reduce History Job Monitoring  parse mr history job logs from hdfs    Map Reduce Running Job Monitoring  get mr running job details from resource manager    Map Reduce Metrics Aggregation  aggregate metrics generated by applications above", 
+            "title": "Applications"
+        }, 
+        {
+            "location": "/applications/#data-ingestion-and-process", 
+            "text": "We build storm topology to fulfill requirements for each application.     Map Reduce History Job Monitoring (Figure 1)   Read Spout  read/parse history job logs from HDFS and flush to eagle service(storage is Hbase)    Sink Bolt  convert parsed jobs to streams and write to data sink      Map Reduce Running Job Monitoring (Figure 2)  Read Spout  fetch running job list from resource manager and emit to Parse Bolt    Parse Bolt  for each running job, fetch job detail/job counter/job configure/tasks from resource manager      Map Reduce Metrics Aggregation (Figure 3)  Divide Spout  divide time period(need to be aggregated) to small pieces and emit to Aggregate Bolt    Aggregate Bolt  aggregate metrics for given time period received from Divide Spout", 
+            "title": "Data Ingestion And Process"
+        }, 
+        {
+            "location": "/applications/#setup-installation_2", 
+            "text": "Make sure already setup a site (here use a demo site named \"sandbox\").    Install \"Map Reduce History Job\" app in eagle server(Take this application as an example).    Configure Application settings     Ensure a kafka topic named {SITE_ID}_map_reduce_failed_job (In current guide, it should be sandbox_map_reduce_failed_job) will be created.    Click \"Install\" button then you will see the MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID} in Alert- Streams.\n     \n  This application will write stream data to kafka topic(created by last step)", 
+            "title": "Setup &amp; Installation"
+        }, 
+        {
+            "location": "/applications/#integration-with-alert-engine", 
+            "text": "In order to integrate applications with alert engine and send alerts, follow below steps(Take Map Reduce History Job application as an example):    define stream and configure data sink   define stream in resource/META-INF/providers/xxxProviders.xml\nFor example, MAP_REDUCE_FAILED_JOB_STREAM_{SITE_ID}  configure data sink\nFor example, create kafka topic {SITE_ID}_map_reduce_failed_job     define policy    For example, if you want to receive map reduce job failure alerts, you can define policies (SiddhiQL) as the following:  from map_reduce_failed_job_stream[site== sandbox  and currentState== FAILED ]\nselect site, queue, user, jobType, jobId, submissionTime, trackingUrl, startTime, endTime\ngroup by jobId insert into map_reduce_failed_job_stream_out    view alerts   You can view alerts in Alert- alerts page.", 
+            "title": "Integration With Alert Engine"
+        }, 
+        {
+            "location": "/applications/#stream-schema_1", 

[... 247 lines stripped ...]
Propchange: eagle/site/docs/latest/mkdocs/search_index.json
------------------------------------------------------------------------------
    svn:eol-style = native