You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spot.apache.org by na...@apache.org on 2017/09/26 22:41:26 UTC

[23/50] [abbrv] incubator-spot git commit: Edits

Edits


Project: http://git-wip-us.apache.org/repos/asf/incubator-spot/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spot/commit/fafae280
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spot/tree/fafae280
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spot/diff/fafae280

Branch: refs/heads/SPOT-181_ODM
Commit: fafae280a7c3c1f23fe8f294ed55b8683aeb2da3
Parents: 2d2744d
Author: Brandon Edwards <br...@intel.com>
Authored: Wed Sep 6 13:02:06 2017 -0700
Committer: Brandon Edwards <br...@intel.com>
Committed: Wed Sep 6 13:02:06 2017 -0700

----------------------------------------------------------------------
 spot-ml/DATA_SAMPLE.md | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/fafae280/spot-ml/DATA_SAMPLE.md
----------------------------------------------------------------------
diff --git a/spot-ml/DATA_SAMPLE.md b/spot-ml/DATA_SAMPLE.md
index 979da8f..5b7d412 100644
--- a/spot-ml/DATA_SAMPLE.md
+++ b/spot-ml/DATA_SAMPLE.md
@@ -1,13 +1,13 @@
 
 # DNS Labeled Data Set
 
-An IXIA BreakingPoint box was used to simulate both normal and attack (DNS tunnelling) DNS traffic. The resulting pcaps were obtained and fields relevant to Apache Spot (incubating) were injested. The attacks can be differentiated from the normal activity due to codes that were inserted into the Transaction ID field (upon ingestion: ‘dns_id’) which identifies either the fact that the traffic was normal or identifies the specific DNS tunneling activity being used. We provide the schema for the injested pcap data as well as the location and specifications of both the raw pcaps and ingested data within Amazon-S3. Information is also provided for how to interpret the Transaction ID codes.
+An IXIA BreakingPoint box was used to simulate both normal and attack (DNS tunnelling) DNS traffic. The resulting pcaps were obtained and fields relevant to Apache Spot (incubating) were injested. The attacks can be differentiated from the normal activity due to codes that were inserted into the Transaction ID field (upon ingestion: ‘dns_id’) which identifies either the fact that the traffic was normal or identifies the specific DNS tunneling activity being used. We provide the schema for the injested pcap data as well as the location and specifications the ingested data within Amazon-S3. Information is also provided for how to interpret the dns_id field..
 
 
 
-## Schema For Ingested Data
+## Data Schema
 
-The schema for the ingested DNS data includes one field ('dns_id') in addition to what is usually used for DNS data in Apache Spot (incubating). The schema is as follows:
+The schema for this data includes one field ('dns_id') in addition to what is usually used for DNS data in Apache Spot (incubating). The schema is as follows:
 
 
 | Name         | Type      |
@@ -24,8 +24,8 @@ The schema for the ingested DNS data includes one field ('dns_id') in addition t
 | dns_a        | string    |
 | dns_id       | string    |
 
-## Transaction ID Interpretations
-Each value of the transaction ID ('dns_id' in the ingested data) indicates that either the data row was taken from a packet capture of simulated normal DNS traffic, or from a packet capture of a particular type of simulated DNS tunnelling.
+## Interpreting dns_id
+The value of dns_id indicates that either the data row was taken from a packet capture of simulated normal DNS traffic, or from a packet capture of a particular type of simulated DNS tunnelling.
 
 Within BreakingPoint, Transaction IDs are represented as a decimal number. However, tshark dissects the transaction ID in its hexadecimal representation (the format contained within parenthesis in the table below).
 
@@ -41,18 +41,17 @@ Within Apache Spot (incubating), only responses from DNS servers are ingested si
 | TCP_DNS_Tunnel_BE_2       | 1005 (0x000003ed)  | [Attack] This super Flow simulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base16Alpha encoding. |
 | TCP_DNS_Tunnel_BE_3       | 1007 (0x000003ef)  | [Attack] This super Flow simulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base63 encoding. |
 
-## Data Sets
+## Data Location
 
 | Simulation Date   | Type  | Location  | Size  | Additional Comments   |
 |-------------------|:------|:---------:|:-----:|:---------------------:|
-| May 9, 2017       | Tarball of pcap files | | | |
 | May 9, 2017       | Tarball of ingested data (parquet format) | | | |
 
 
-## Number of Rows associated to each Transaction ID
+## Number of Rows associated to each Value of dns_id
 
 | Simulation Date  | Total Records  | dns_id=1008 | dns_id=1002 | dns_id=1003 | dns_id=1001 | dns_id=1005 | dns_id=1007 |
 |:-----------------|:--------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
-| 5/9/2017         | 391,364,387    | 391,314,477 | 16,317      | 21,666      | 4,156       | 2,743       | 5, 028      |
+| 5/9/2017         | 391,364,387    | 391,314,477 | 16,317      | 21,666      | 4,156       | 2,743       | 5,028       |