You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spot.apache.org by ev...@apache.org on 2017/03/29 16:52:10 UTC
[46/50] [abbrv] incubator-spot git commit: CSV Removal documentation update for proxy notebooks

CSV Removal documentation update for proxy notebooks


Project: http://git-wip-us.apache.org/repos/asf/incubator-spot/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spot/commit/dbb5174d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spot/tree/dbb5174d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spot/diff/dbb5174d

Branch: refs/heads/SPOT-35_graphql_api
Commit: dbb5174dfefd97236781e214d539443821821ad0
Parents: 363c02d
Author: LedaLima <le...@apache.org>
Authored: Mon Mar 13 12:19:18 2017 -0600
Committer: Diego Ortiz Huerta <di...@intel.com>
Committed: Wed Mar 15 11:51:23 2017 -0700

----------------------------------------------------------------------
 .../oa/proxy/ipynb_templates/EdgeNotebook.md    | 53 +++++--------
 .../ipynb_templates/ThreatInvestigation.md      | 78 ++++----------------
 2 files changed, 33 insertions(+), 98 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/dbb5174d/spot-oa/oa/proxy/ipynb_templates/EdgeNotebook.md
----------------------------------------------------------------------
diff --git a/spot-oa/oa/proxy/ipynb_templates/EdgeNotebook.md b/spot-oa/oa/proxy/ipynb_templates/EdgeNotebook.md
index 2f9472c..d75ec3f 100644
--- a/spot-oa/oa/proxy/ipynb_templates/EdgeNotebook.md
+++ b/spot-oa/oa/proxy/ipynb_templates/EdgeNotebook.md
@@ -19,56 +19,37 @@ The following python modules will be imported for the notebook to work correctly
 
 
 ###Pre-requisites
-- Execution of the spot-oa process for Proxy
+- Execute hdfs_setup.sh script to create OA tables and setup permissions
 - Correct setup the spot.conf file [Read more](/wiki/Edit%20Solution%20Configuration)
-- Have a public key created between the current UI node and the ML node. [Read more](/wiki/Configure%20User%20Accounts#configure-user-accounts)
-
-
-###Data
-The whole process in this notebook depends entirely on the existence of `proxy_scores.tsv` file, which is generated at the OA process.  
-The data is directly manipulated on the .tsv files, so a `proxy_scores_bu.tsv` is created as a backup to allow the user to restore the original data at any point, 
-and this can be performed executing the last cell on the notebook with the following command.
-
-        !cp $sconnectbu $sconnect
-
-
-**Input files**
-All these paths should be relative to the main OA path.    
-Schema for these files can be found [here](/spot-oa/oa/proxy)
+- Execution of the spot-oa process for Proxy
+- Correct installation of the UI [Read more](/ui/INSTALL.md)
 
-        data/proxy/<date>/proxy_scores.tsv  
-        data/proxy/<date>/proxy_scores_bu.tsv
 
-**Temporary Files**
+###Data source 
+The whole process in this notebook depends entirely on the existence of `proxy_scores` table in the database, which is generated at the OA process.  
+The data is manipulated through the graphql api also included in the repository.
 
-        data/proxy/<date>/proxy_scores_tmp.tsv
+**Input**  
+The data to be processed should be stored in the following tables:
 
-**Output files**
+        proxy_scores
+        proxy
 
-        data/proxy/<date>/proxy_scores.tsv (Updated with severity values)
-        data/proxy/<date>/proxy_scores_fb.csv (File with scored connections that will be used for ML feedback)
+**Output**
+The following tables will be populated after the scoring process:
+        proxy_threat_investigation
 
 
 ###Functions
 **Widget configuration**
 This is not a function, but more like global code to set up styles and widgets to format the output of the notebook. 
 
-`data_loader():` - This function loads the source file into a csv dictionary reader to create a list with all disctinct full_uri values. 
+`data_loader():` - - This function calls the graphql api query *suspicious* to list all suspicious unscored connections.
   
 `fill_list(list_control,source):` - This function loads the given dictionary into a listbox and appends an empty item at the top with the value '--Select--' (Just for design sake)
    
 ` assign_score(b):` - This event is executed when the user clicks the 'Score' button. 
-If the 'Quick scoring' textbox is not empty, the notebook will read that value and ignore any selection made in the listbox, otherwise the sought value will be obtained from the listbox.
-A linear search will be performed in the `proxy_scores.tsv` file to find all `full_uri` values matching the sought .
-In every matching row found, the `uri_sev` value will be updated according to the 'Rating' value selected in the radio button list. 
-All of the rows will then be appended to the `proxy_scores_tmp.tsv` file. At the end of this process, this file will replace the original `proxy_scores.tsv`.  
-
-Only the scored rows will also be appended to the `proxy_scores_fb.csv` file, which will later be used for the ML feedback.
-
-`save(b):` -This event is triggered by the 'Save' button, first it will remove the widget area and call the `load_data()` function to start the loading process again, this will 
-refresh the listbox removing all scored URIs.
-A javascript function is also executed to refresh the other panels in the suspicious connects page removing the need of a manual refresh.
-Afterwards the `ml_feedback()` function will be invoqued. 
+If the 'Quick scoring' textbox is not empty, the notebook will read that value and ignore any selection made in the listbox, otherwise the sought value will be obtained from the listbox and will append each value to a temporary list. 
 
-`ml_feedback():` - A shell script is executed, transferring thru secure copy the _proxy_scores_fb.csv_ file into ML Master node, where the destination path is defined at the spot.conf file.
-   
\ No newline at end of file
+`save(b):` -This event is triggered by the 'Save' button, first it will remove the widget area and call the `load_data()` function to start the loading process again, this will refresh the listbox removing all scored URIs.
+This function calls the *score* mutation which updates the score for the selected values in the database.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/dbb5174d/spot-oa/oa/proxy/ipynb_templates/ThreatInvestigation.md
----------------------------------------------------------------------
diff --git a/spot-oa/oa/proxy/ipynb_templates/ThreatInvestigation.md b/spot-oa/oa/proxy/ipynb_templates/ThreatInvestigation.md
index 0681461..cb17619 100644
--- a/spot-oa/oa/proxy/ipynb_templates/ThreatInvestigation.md
+++ b/spot-oa/oa/proxy/ipynb_templates/ThreatInvestigation.md
@@ -22,7 +22,8 @@ The following python modules will have to be imported for the notebook to work c
 
 ##Pre-requisites  
 - Execution of the spot-oa process for Proxy
-- Score a set connections at the Edge Investigation Notebook
+- Correct installation of the UI [Read more](/ui/INSTALL.md)
+- Score a set connections at the Edge Investigation Notebook 
 - Correct setup of the spot.conf file. [Read more](/wiki/Edit%20Solution%20Configuration) 
 
 
@@ -30,62 +31,20 @@ The following python modules will have to be imported for the notebook to work c
 `top_results` - This value defines the number of rows that will be displayed onscreen after the expanded search. 
 This also affects the number of IPs that will appear in the Timeline chart.
 
-##Data
-The whole process in this notebook depends entirely on the existence of the scored _proxy_scores.tsv_ file, which is generated at the OA process, and scored at the Edge Investigation Notebook.
-
-**Input files**
-Schema for these files can be found [here](/spot-oa/oa/proxy)
-
-        ~/spot-oa/data/proxy/<date>/proxy_scores.tsv  
-
-**Output files**  
-- threats.csv : Pipe separated file containing the comments saved by the user. This file is updated every time the user adds comments for a new threat. 
-        
-        Schema with zero-indexed columns:
-        
-        0.hash: string
-        1.title: string
-        2.description: string
-
-- incident-progression-\<anchor hash>.json : Json file generated in base of the results from the expanded 
-search. This file includes a list of all requests performed to and from the URI under analysis, as well as the request methods used and the response content type. 
-These results are limited to the day under analysis. 
-this file will serve as datasource for the Incident Progression chart at the storyboard.
-        
-        Schema with zero-indexed columns:
-
-        {
-            'fulluri':<URI under investigation>, 
-            'requests': [{
-                'clientip':<client IP>,
-                'referer':<referer for the URI under analysis>,
-                'reqmethod':<method used to connect to the URI>,
-                'resconttype':<content type of the response>
-                }, ...
-                ],
-            'referer_for':[
-                         <List of unique URIs refered by the URI under investigation> 
-            ]
-        }
-
-- timeline-\<anchor hash>.tsv : Tab separated file, this file lists all the client IP's that connected to the URI under investigation, including: 
-the duration of the connection, response code and exact date and time of the connections.
-
-        Schema with zero-indexed columns:
-        
-        0.tstart: string
-        1.tend: string
-        2.duration: string
-        3.clientip: string
-        4.respcode: string
- 
-- es-\<anchor hash>.tsv : (Expanded Search). Tab separated file, this is formed with the results from the Expanded Search query. Includes all connections where the investigated URI matches the `referer` or the `full_uri` columns.  
+##Data source
+Data should exists in the following tables:
+        *proxy*
+        *proxy_threat_investigation*
 
 
-**HDFS tables consumed**
+**Output**  
+The following tables will be populated after the threat investigation process:
+        *proxy_storyboard*
+        *proxy_timeline*
 
-        proxy
+The following files will be created and stored in HDFS.
 
+        incident-progression-\<anchor hash>.json
 
 ##Functions  
 **Widget configuration**
@@ -94,9 +53,8 @@ This is not a function, but more like global code to set up styles and widgets t
 
 `start_investigation():` - This function cleans the notebook from previous executions, then calls the data_loader() function to obtain the data and afterwards displays the corresponding widgets
 
-`data_loader():` - This function loads the source _proxy_scores.tsv_ file into a csv dictionary reader to create a list with all disctinct `full_uri` values 
-where `uri_sev` = 1. This function will also read through the _threats.tsv_ file to discard all URIs that have already been investigated. 
-  
+`data_loader():` - This function lcalls the *threats* query to get the source and destination IP's previously scored as high risk to create a list with all disctinct `full_uri` values.
+
 `fill_list(list_control,source):` - This function populates a listbox widget with the given data list and appends an empty item at the top with the value '--Select--' (Just for visualization  sake)
 
 `display_controls():` - This function will only display the main widget box, containing:
@@ -106,9 +64,7 @@ where `uri_sev` = 1. This function will also read through the _threats.tsv_ file
 - Container for the "Top N results" HTML table
 
 `search_ip(b):` - This function is triggered by the _onclick_ event of the "Search" button.
-This will get the selected value from the listbox and perform a query to the _proxy_ table to retrieve all comunication involving the selected URI.
-Using MD5 algorythm, the URI will be hashed and use it in the name of the output files (anchor hash)
-The output of the query will automatically fill the es-/<anchor hash>.tsv file. 
+This calls the graphql *threat / details* query to find additional connections involving the selected full uri. 
 Afterwards it will read through the output file to display the HTML table, this will be limited to the value set in the _top_results_ variable. At the same time, four dictionaries will be filled:
 - clientips
 - reqmethods * 
@@ -119,8 +75,6 @@ Afterwards it will read through the output file to display the HTML table, this
 This function will also display the 'Threat summary' and 'title' textboxes, along with the 'Save' button.
 
 `save_threat_summary(b):` - This function is triggered by the _onclick_ event on the 'Save' button.
- This will take the contents of the form and create/update the _threats.csv_ file.
- 
-`file_is_empty(path):` - Performs a validation to check the file size to determine if it is empty.
+Removes the widgets and cleans the notebook from previous executions, removes the selected value from the listbox widget and executes the *createStoryboard* mutation to save the data for the storyboard.
  
 `removeWidget(index):` - Javascript function that removes a specific widget from the notebook. 
\ No newline at end of file