You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spot.apache.org by ra...@apache.org on 2017/06/09 21:53:05 UTC

[2/3] incubator-spot git commit: doc_restructuring

doc_restructuring

added documentation on the parameters that ml_ops.sh grabs from /etc/spot.conf


Project: http://git-wip-us.apache.org/repos/asf/incubator-spot/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spot/commit/bf3283ba
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spot/tree/bf3283ba
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spot/diff/bf3283ba

Branch: refs/heads/master
Commit: bf3283ba237c6451c8d2e56ca5482d16e20847ed
Parents: 7653877
Author: nlsegerl <na...@intel.com>
Authored: Fri Jun 9 09:48:58 2017 -0700
Committer: nlsegerl <na...@intel.com>
Committed: Fri Jun 9 09:48:58 2017 -0700

----------------------------------------------------------------------
 spot-ml/ML_OPS.md | 13 +++++++++++--
 spot-ml/README.md |  5 +++++
 2 files changed, 16 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/bf3283ba/spot-ml/ML_OPS.md
----------------------------------------------------------------------
diff --git a/spot-ml/ML_OPS.md b/spot-ml/ML_OPS.md
index df974e0..585a9c1 100644
--- a/spot-ml/ML_OPS.md
+++ b/spot-ml/ML_OPS.md
@@ -45,8 +45,7 @@ As the maximum probability of an event is 1, a threshold of 1 can be used to sel
 ```
 ## ml_ops.sh output
 
-Final results are stored in the following file on HDFS.
-
+Final results are stored in the following file on HDFS
 Depending on which data source is analyzed, 
 spot-ml output will be found under the ``HPATH`` at one of
 
@@ -57,3 +56,13 @@ spot-ml output will be found under the ``HPATH`` at one of
 
 It is a csv file in which network events annotated with estimated probabilities and sorted in ascending order.
 
+## Parameters taken from the /etc/spot.conf file
+
+The ml_ops.sh script takes its values for the following parameters from the /etc/spot.conf file:
+
+* **All spark settings** Among them driver memory, number of executors, spark.driever.MaxResultSize, etc.
+* **Paths to storage locations for Spot ingested data**
+* **USER_DOMAIN** The domain name for the network being analyzed. Used to denote "internal" URLs during proxy and dns analyses.
+* **TOPIC_COUNT** Number of topics used for the topic modelling at the heart of the Suspicious Connects anomaly detection. Roughly, the analysis attempts to generate TOPIC_COUNT many profiles of common traffic in the cluster.
+* **DUPFACTOR** Used to downgrade the threat level of records similar to those marked as non-threatening by the feedback function of Spot UI. DUPFACTOR inflate the frequency of such records to make them appear less anomalous. A DUPFACTOR of 1 has no effect, and a DUPFACTOR of 1000 increases the frequency of the connection's pattern by a factor of 1000, increasing its estimated probability accordingly.
+

http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/bf3283ba/spot-ml/README.md
----------------------------------------------------------------------
diff --git a/spot-ml/README.md b/spot-ml/README.md
index 4e48837..876ee77 100644
--- a/spot-ml/README.md
+++ b/spot-ml/README.md
@@ -13,6 +13,11 @@ These routines are contained in a jar file   and there is a shell script ml_ops.
 * [jar documentation here](SPOT-ML-JAR.md)
 * [ml_ops.sh documentation here](ML_OPS.md) 
 
+## Configure the /etc/spot.conf file
+
+If using spot-ml as part of the integrated spot solution (or if you simply wish to use the ml_ops.sh script to invoke the suspicious connects analysis), 
+the /etc/spot.conf file must be correctly configured.
+
 ## Prepare data for input 
 
 Whether suspicious connects is called by ml_ops.sh or through the ml-ops jar, data must be in the [schema used by the suspicious connects analyses](SUSPICIOUS_CONNECTS_SCHEMA.md).  Ingesting data via the Spot ingest tools will store data in an appropriate schema.