You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@metron.apache.org by ni...@apache.org on 2018/08/27 21:35:24 UTC

metron git commit: METRON-1714 Create RPM Packaging for the Batch Profiler (nickwallen) closes apache/metron#1163

Repository: metron
Updated Branches:
  refs/heads/feature/METRON-1699-create-batch-profiler c6d0721b8 -> c7a3dc230


METRON-1714 Create RPM Packaging for the Batch Profiler (nickwallen) closes apache/metron#1163


Project: http://git-wip-us.apache.org/repos/asf/metron/repo
Commit: http://git-wip-us.apache.org/repos/asf/metron/commit/c7a3dc23
Tree: http://git-wip-us.apache.org/repos/asf/metron/tree/c7a3dc23
Diff: http://git-wip-us.apache.org/repos/asf/metron/diff/c7a3dc23

Branch: refs/heads/feature/METRON-1699-create-batch-profiler
Commit: c7a3dc230a8fdfbbefcbe3a04c9c5cc05bc74853
Parents: c6d0721
Author: nickwallen <ni...@nickallen.org>
Authored: Mon Aug 27 17:35:00 2018 -0400
Committer: nickallen <ni...@apache.org>
Committed: Mon Aug 27 17:35:00 2018 -0400

----------------------------------------------------------------------
 .../metron-profiler-spark/README.md             | 219 +++++++++++++++++++
 .../common-services/METRON/CURRENT/metainfo.xml |   5 +-
 .../docker/rpm-docker/SPECS/metron.spec         |  37 +++-
 .../packaging/docker/rpm-docker/pom.xml         |   6 +
 4 files changed, 260 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-analytics/metron-profiler-spark/README.md
----------------------------------------------------------------------
diff --git a/metron-analytics/metron-profiler-spark/README.md b/metron-analytics/metron-profiler-spark/README.md
new file mode 100644
index 0000000..0a31263
--- /dev/null
+++ b/metron-analytics/metron-profiler-spark/README.md
@@ -0,0 +1,219 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Metron Profiler for Spark
+
+This project allows profiles to be executed using [Apache Spark](https://spark.apache.org). This is a port of the Profiler to Spark.
+
+* [Introduction](#introduction)
+* [Getting Started](#getting-started)
+* [Installation](#installation)
+* [Configuring the Profiler](#configuring-the-profiler)
+* [Running the Profiler](#running-the-profiler)
+
+## Introduction
+
+Using the [Streaming Profiler](../metron-profiler/README.md) in [Apache Storm](http://storm.apache.org) allows you to create profiles based on the stream of telemetry being captured, enriched, triaged, and indexed by Metron. This does not allow you to create a profile based on telemetry that was captured in the past.  
+
+There are many cases where you might want to produce a profile from telemetry in the past.  This is referred to as "profile seeding".
+
+* As a Security Data Scientist, I want to understand the historical behaviors and trends of a profile so that I can determine if the profile has predictive value for model building.
+
+* As a Security Platform Engineer, I want to generate a profile using archived telemetry when I deploy a new model to production so that models depending on that profile can function on day 1.
+
+The Batch Profiler running in [Apache Spark](https://spark.apache.org) allows you to seed a profile using archived telemetry.
+
+The portion of a profile produced by the Batch Profiler should be indistinguishable from the portion created by the Streaming Profiler.  Consumers of the profile should not care how the profile was generated.  Using the Streaming Profiler together with the Batch Profiler allows you to create a complete profile over a wide range of time.
+
+For an introduction to the Profiler and Profiler concepts, see the [Profiler README](../metron-profiler/README.md).
+
+## Getting Started
+
+
+
+1. Create a profile definition by editing `$METRON_HOME/config/zookeeper/profiler.json` as follows.  
+
+    ```
+    cat $METRON_HOME/config/zookeeper/profiler.json
+    {
+      "profiles": [
+        {
+          "profile": "hello-world",
+          "foreach": "'global'",
+          "init":    { "count": "0" },
+          "update":  { "count": "count + 1" },
+          "result":  "count"
+        }
+      ],
+      "timestampField": "timestamp"
+    }
+    ```
+
+1. Ensure that you have archived telemetry available for the Batch Profiler to consume.  By default, Metron will store this in HDFS at `/apps/metron/indexing/indexed/*/*`.
+
+    ```
+    hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l
+    ```
+
+1. Review the Batch Profiler's properties located at `$METRON_HOME/config/batch-profiler.properties`.  See [Configuring the Profiler](#configuring-the-profiler) for more information on these properties.
+
+1. You may want to edit the log4j properties that sits in your config directory in `${SPARK_HOME}` or create one.  It may be helpful to turn on `DEBUG` logging for the Profiler by adding the following line.
+
+	  ```
+	  log4j.logger.org.apache.metron.profiler.spark=DEBUG
+	  ```
+
+1. Run the Batch Profiler.
+
+    ```
+    source /etc/default/metron
+    cd $METRON_HOME
+    $METRON_HOME/bin/start_batch_profiler.sh
+    ```
+
+1. Query for the profile data using the [Profiler Client](../metron-profiler-client/README.md).
+
+## Installation
+
+The Batch Profiler package is installed automatically when installing Metron using the Ambari MPack.  See the following notes when installing the Batch Profiler without the Ambari MPack.
+
+### Prerequisites
+
+The Batch Profiler requires Spark version 2.3.0+.
+
+### Packages
+
+#### Build the RPM
+
+1. Build Metron.
+    ```
+    mvn clean package -DskipTests -T2C
+    ```
+
+1. Build the RPMs.
+    ```
+    cd metron-deployment/
+    mvn clean package -Pbuild-rpms
+    ```
+
+1. Retrieve the package.
+    ```
+    find ./ -name "metron-profiler-spark*.rpm"
+    ```
+
+### Build the DEB
+
+1. Build Metron.
+    ```
+    mvn clean package -DskipTests -T2C
+    ```
+
+1. Build the DEBs.
+    ```
+    cd metron-deployment/
+    mvn clean package -Pbuild-debs
+    ```
+
+1. Retrieve the package.
+    ```
+    find ./ -name "metron-profiler-spark*.deb"
+    ```
+
+## Configuring the Profiler
+
+By default, the configuration for the Batch Profiler is stored in the local filesystem at `$METRON_HOME/config/batch-profiler.properties`.
+
+You can store both settings for the Profiler along with settings for Spark in this same file.  Spark will only read settings that start with `spark.`.
+
+| Setting                                                                       | Description
+|---                                                                            |---
+| [`profiler.batch.input.path`](#profilerbatchinputpath)                        | The path to the input data read by the Batch Profiler.
+| [`profiler.batch.input.format`](#profilerbatchinputformat)                    | The format of the input data read by the Batch Profiler.
+| [`profiler.period.duration`](#profilerperiodduration)                         | The duration of each profile period.  
+| [`profiler.period.duration.units`](#profilerperioddurationunits)              | The units used to specify the [`profiler.period.duration`](#profilerperiodduration).
+| [`profiler.hbase.salt.divisor`](#profilerhbasesaltdivisor)                    | A salt is prepended to the row key to help prevent hot-spotting.
+| [`profiler.hbase.table`](#profilerhbasetable)                                 | The name of the HBase table that profiles are written to.
+| [`profiler.hbase.column.family`](#profilerhbasecolumnfamily)                  | The column family used to store profiles.
+
+### `profiler.batch.input.path`
+
+*Default*: hdfs://localhost:9000/apps/metron/indexing/indexed/*/*
+
+The path to the input data read by the Batch Profiler.
+
+### `profiler.batch.input.format`
+
+*Default*: text
+
+The format of the input data read by the Batch Profiler.
+
+### `profiler.period.duration`
+
+*Default*: 15
+
+The duration of each profile period.  This value should be defined along with [`profiler.period.duration.units`](#profilerperioddurationunits).
+
+*Important*: To read a profile using the [Profiler Client](metron-analytics/metron-profiler-client), the Profiler Client's `profiler.client.period.duration` property must match this value.  Otherwise, the Profiler Client will be unable to read the profile data.  
+
+### `profiler.period.duration.units`
+
+*Default*: MINUTES
+
+The units used to specify the `profiler.period.duration`.  This value should be defined along with [`profiler.period.duration`](#profilerperiodduration).
+
+*Important*: To read a profile using the Profiler Client, the Profiler Client's `profiler.client.period.duration.units` property must match this value.  Otherwise, the [Profiler Client](metron-analytics/metron-profiler-client) will be unable to read the profile data.
+
+### `profiler.hbase.salt.divisor`
+
+*Default*: 1000
+
+A salt is prepended to the row key to help prevent hotspotting.  This constant is used to generate the salt.  This constant should be roughly equal to the number of nodes in the Hbase cluster to ensure even distribution of data.
+
+### `profiler.hbase.table`
+
+*Default*: profiler
+
+The name of the HBase table that profile data is written to.  The Profiler expects that the table exists and is writable.  It will not create the table.
+
+### `profiler.hbase.column.family`
+
+*Default*: P
+
+The column family used to store profile data in HBase.
+
+## Running the Profiler
+
+A script located at `$METRON_HOME/bin/start_batch_profiler.sh` has been provided to simplify running the Batch Profiler.  The Batch Profiler may also be started as follows using the `spark-submit` script.
+
+```
+${SPARK_HOME}/bin/spark-submit \
+    --class org.apache.metron.profiler.spark.cli.BatchProfilerCLI \
+    --properties-file ${SPARK_PROPS_FILE} \
+    ${PROFILER_JAR} \
+    --config ${PROFILER_PROPS_FILE} \
+    --profiles ${PROFILES_FILE}
+```
+
+The Batch Profiler also accepts the following command line arguments when run from the command line.
+
+| Argument         | Description
+|---               |---
+| -p, --profiles   | The path to a file containing the profile definitions.
+| -c, --config     | The path to the profiler properties file.
+| -g, --globals    | The path to a properties file containing global properties.
+| -h, --help       | Print the help text.
+

http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
----------------------------------------------------------------------
diff --git a/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml b/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
index f83d93b..eae756a 100644
--- a/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
+++ b/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
@@ -404,7 +404,10 @@
               <name>metron-enrichment</name>
             </package>
             <package>
-              <name>metron-profiler</name>
+              <name>metron-profiler-storm</name>
+            </package>
+            <package>
+              <name>metron-profiler-spark</name>
             </package>
             <package>
               <name>metron-indexing</name>

http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
----------------------------------------------------------------------
diff --git a/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec b/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
index b308908..94dc951 100644
--- a/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
+++ b/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
@@ -58,6 +58,7 @@ Source11:       metron-management-%{full_version}-archive.tar.gz
 Source12:       metron-maas-service-%{full_version}-archive.tar.gz
 Source13:       metron-alerts-%{full_version}-archive.tar.gz
 Source14:       metron-performance-%{full_version}-archive.tar.gz
+Source15:       metron-profiler-spark-%{full_version}-archive.tar.gz
 
 %description
 Apache Metron provides a scalable advanced security analytics framework
@@ -95,6 +96,7 @@ tar -xzf %{SOURCE11} -C %{buildroot}%{metron_home}
 tar -xzf %{SOURCE12} -C %{buildroot}%{metron_home}
 tar -xzf %{SOURCE13} -C %{buildroot}%{metron_home}
 tar -xzf %{SOURCE14} -C %{buildroot}%{metron_home}
+tar -xzf %{SOURCE15} -C %{buildroot}%{metron_home}
 
 install %{buildroot}%{metron_home}/bin/metron-management-ui %{buildroot}/etc/init.d/
 install %{buildroot}%{metron_home}/bin/metron-alerts-ui %{buildroot}/etc/init.d/
@@ -379,15 +381,15 @@ This package installs the Metron PCAP files %{metron_home}
 
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-%package        profiler
-Summary:        Metron Profiler
+%package        profiler-storm
+Summary:        Metron Profiler for Storm
 Group:          Applications/Internet
-Provides:       profiler = %{version}
+Provides:       profiler-storm = %{version}
 
-%description    profiler
-This package installs the Metron Profiler %{metron_home}
+%description    profiler-storm
+This package installs the Metron Profiler for Storm %{metron_home}
 
-%files          profiler
+%files          profiler-storm
 %defattr(-,root,root,755)
 %dir %{metron_root}
 %dir %{metron_home}
@@ -536,6 +538,27 @@ This package installs the Metron Alerts UI %{metron_home}
 
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+%package        profiler-spark
+Summary:        Metron Profiler for Spark
+Group:          Applications/Internet
+Provides:       profiler-spark = %{version}
+
+%description    profiler-spark
+This package installs the Metron Profiler for Spark %{metron_home}
+
+%files          profiler-spark
+%defattr(-,root,root,755)
+%dir %{metron_root}
+%dir %{metron_home}
+%dir %{metron_home}/config
+%{metron_home}/config/batch-profiler.properties
+%dir %{metron_home}/bin
+%{metron_home}/bin/start_batch_profiler.sh
+%dir %{metron_home}/lib
+%attr(0644,root,root) %{metron_home}/lib/metron-profiler-spark-%{full_version}.jar
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 %post config
 chkconfig --add metron-management-ui
 chkconfig --add metron-alerts-ui
@@ -545,6 +568,8 @@ chkconfig --del metron-management-ui
 chkconfig --del metron-alerts-ui
 
 %changelog
+* Tue Aug 14 2018 Apache Metron <de...@metron.apache.org> - 0.5.1
+- Add Profiler for Spark
 * Thu Feb 1 2018 Apache Metron <de...@metron.apache.org> - 0.4.3
 - Add Solr install script to Solr RPM
 * Tue Sep 25 2017 Apache Metron <de...@metron.apache.org> - 0.4.2

http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-deployment/packaging/docker/rpm-docker/pom.xml
----------------------------------------------------------------------
diff --git a/metron-deployment/packaging/docker/rpm-docker/pom.xml b/metron-deployment/packaging/docker/rpm-docker/pom.xml
index ba57079..1ea8d46 100644
--- a/metron-deployment/packaging/docker/rpm-docker/pom.xml
+++ b/metron-deployment/packaging/docker/rpm-docker/pom.xml
@@ -168,6 +168,12 @@
                                     </includes>
                                 </resource>
                                 <resource>
+                                    <directory>${metron_dir}/metron-analytics/metron-profiler-spark/target/</directory>
+                                    <includes>
+                                        <include>*.tar.gz</include>
+                                    </includes>
+                                </resource>
+                                <resource>
                                     <directory>${metron_dir}/metron-interface/metron-rest/target/</directory>
                                     <includes>
                                         <include>*.tar.gz</include>