You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spot.apache.org by ev...@apache.org on 2017/03/29 16:51:55 UTC
[31/50] [abbrv] incubator-spot git commit: Updating setup documentation

Updating setup documentation


Project: http://git-wip-us.apache.org/repos/asf/incubator-spot/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spot/commit/03e6319f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spot/tree/03e6319f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spot/diff/03e6319f

Branch: refs/heads/SPOT-35_graphql_api
Commit: 03e6319f0d8fe109c27b51d106838d23930c8d36
Parents: 85431c6
Author: Moises Valdovinos <mv...@mvaldovi-mac01.amr.corp.intel.com>
Authored: Thu Mar 9 01:47:20 2017 -0600
Committer: Diego Ortiz Huerta <di...@intel.com>
Committed: Wed Mar 15 11:49:48 2017 -0700

----------------------------------------------------------------------
 spot-setup/README.md | 43 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/03e6319f/spot-setup/README.md
----------------------------------------------------------------------
diff --git a/spot-setup/README.md b/spot-setup/README.md
index 1ac02f2..ad72eb2 100644
--- a/spot-setup/README.md
+++ b/spot-setup/README.md
@@ -8,7 +8,7 @@ This document is intended for any developer or sysadmin in learning the technica
 
 This information will help you to get started on contributing to the Apache Spot Setup repository. For information about installing and running Apache Spot go to our [Installation Guide](http://spot.apache.org/doc/).
 
-Spot-setup contains the scripts to setup HDFS for Apache Spot solution. It will create the folder and database structure needed to run Apache Spot on HDFS and HIVE respectively. Spot-setup is a component of Apache Spot and is executed in the initial configuration after Linux user creation and before Ingest installation.
+Spot-setup contains the scripts to setup HDFS for Apache Spot solution. It will create the folder and database structure needed to run Apache Spot on HDFS and Impala respectively. Spot-setup is a component of Apache Spot and is executed in the initial configuration after Linux user creation and before Ingest installation.
 
 ## Prerequisites
 
@@ -26,7 +26,7 @@ The main script in the repository is **hdfs_setup.sh** which is responsible of l
 
 This file also contains sources desired to be installed as part of Apache Spot, general paths for HDFS folders, Kerberos information and local paths in the Linux filesystem for the user as well as for machine learning, ipython, lda and ingest processes.
 
-To read more about these variables, please review the [wiki] (https://github.com/Open-Network-Insight/open-network-insight/wiki/Edit%20Solution%20Configuration).
+To read more about these variables, please review the [documentation] (http://spot.incubator.apache.org/doc/#configuration).
 
 ## Database Query Scripts
 
@@ -34,22 +34,42 @@ spot-setup contains a script per use case, as of today, there is a table creatio
 
 These HQL scripts are intended to be executed as a Hive statement and must comply HQL standards.
 
-We want to create tables in Avro/Parquet format to get a faster query performance. This format is an industry standard and you can find more information about it on:
-- Avro is a data serialization system - https://avro.apache.org/
+We create tables using Parquet format to get a faster query performance. This format is an industry standard and you can find more information about it on:
 - Parquet is a columnar storage format - https://parquet.apache.org/
 
-To get to Avro/parquet format we need a staging table to store CSV data temporarily for Flow and DNS. Then, run a Hive query statement to insert these text-formatted records into the Avro/parquet table. Hive will manage to convert the text data into the desired format. The staging table must be cleaned after loading data to Avro/parquet table for the next batch cycle. For Flow and DNS, a set of a staging (CSV) and a final (Avro/parquet) tables are needed for each data entity. For Proxy, only the Avro/parquet table is needed.
+To get to parquet format we need a staging table to store CSV data temporarily for Flow and DNS. Then, run an Impala query statement to insert these text-formatted records into the parquet table. Impala will manage to convert the text data into the desired format. The staging table must be cleaned after loading data to parquet table for the next batch cycle. For Flow and DNS, a set of a staging (CSV) and a final (parquet) tables are needed for each data entity. For Proxy, only the parquet table is needed.
 
 #### Flow Tables
-- flow - Avro/parquet final table to store flow records
-- flow_tmp - Text table to store temporarily flow records in CSV format
+- flow
+- flow_tmp
+- flow_chords
+- flow_edge
+- flow_ingest_summary
+- flow_scores
+- flow_storyboard
+- flow_threat_investigation
+- flow_timeline
 
 #### DNS Tables
-- dns - Avro/parquet final table to store DNS records
-- dns_tmp - Text table to store temporarily DNS records in CSV format
+- dns
+- dns_tmp
+- dns_dendro
+- dns_edge
+- dns_ingest_summary
+- dns_scores
+- dns_storyboard
+- dns_threat_dendro
+- dns_threat_investigation
 
 #### Proxy Tables
-- proxy - Avro/parquet final table to store Proxy records
+- proxy
+- proxy_edge
+- proxy_ingest_summary
+- proxy_scores
+- proxy_storyboard
+- proxy_threat_investigation
+- proxy_timeline
+
 
 ## Licensing
 
@@ -61,7 +81,8 @@ Create a pull request and contact the maintainers.
 
 ## Issues
 
-Report issues at the�Apache Spot [issues] (https://github.com/Open-Network-Insight/open-network-insight/issues) page.
+- Create an [issue] (https://issues.apache.org/jira/browse/SPOT-20?jql=project%20%3D%20SPOT).
+- Go to our Slack [channel] (https://apachespot.slack.com/messages/general).
 
 ## Maintainers