You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spot.apache.org by na...@apache.org on 2017/09/26 22:41:22 UTC

[19/50] [abbrv] incubator-spot git commit: More edits.

More edits.


Project: http://git-wip-us.apache.org/repos/asf/incubator-spot/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spot/commit/94a39a24
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spot/tree/94a39a24
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spot/diff/94a39a24

Branch: refs/heads/SPOT-181_ODM
Commit: 94a39a245c8b8324301d3e5fc35a510c73686e6a
Parents: b26873a
Author: Brandon Edwards <br...@intel.com>
Authored: Wed Sep 6 10:03:24 2017 -0700
Committer: Brandon Edwards <br...@intel.com>
Committed: Wed Sep 6 10:03:24 2017 -0700

----------------------------------------------------------------------
 spot-ml/DATA_SAMPLE.md  | 68 ++++++++++++++++++++++++++++++++++++++++++++
 spot-ml/DATA_SAMPLES.md | 68 --------------------------------------------
 2 files changed, 68 insertions(+), 68 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/94a39a24/spot-ml/DATA_SAMPLE.md
----------------------------------------------------------------------
diff --git a/spot-ml/DATA_SAMPLE.md b/spot-ml/DATA_SAMPLE.md
new file mode 100644
index 0000000..0698bd1
--- /dev/null
+++ b/spot-ml/DATA_SAMPLE.md
@@ -0,0 +1,68 @@
+
+# DNS Labeled Data Sets
+
+An IXIA BreakingPoint box to simulate both normal and attack (DNS tunnelling) DNS traffic. The resulting pcaps were obtained and fields relevant to Spot injested (both original pcaps and injested parquet files are available in Amazon-S3). The attacks and the normal activity can be differentiated due to codes that were inserted into the Transaction ID field(upon ingestion the field is: ‘dns_id’) which identifies either the fact that the traffic was normal or identifies the specific dns tunneling activity being used.  We provide the schema for the injested pcap data, location and specifications of the data within Amazon-S3, and how to interpret the ‘dns_id’ codes.
+
+Spot (using version #####fill in here###.) was run on these datasets with ten repetitions each.  We provide the Area Under the Curve (AUC) value related to how well the attacks were detected. We also provide the rank distributions for the various attacks within the dataset, with a rank of 1 meaning the entry was found to be the most suspicious entry out of all other entries.
+
+
+## Schema For Ingested Data (same for both data sets)
+
+The schema for this DNS data has one additional field, ‘dns_id’, over what is usually used for DNS data in Spot. The schema is as follows:
+
+
+| Name         | Type      |
+|--------------|:---------:|
+| frame_time   | string    |
+| unix_tstamp  | bigint    |
+| frame_len    | int       |
+| ip_dst       | string    |
+| ip_src       | string    |
+| dns_qry_name | string    |
+| dns_qry_class| string    |
+| dns_qry_type | int       |
+| dns_qry_rcode| int       |
+| dns_a        | string    |
+| dns_id       | string    |
+
+## Transaction ID Interpretations (same for both data sets)
+The following provides interpretations for the values of the transaction ID field, ‘dns_id’. Each value indicates that either the data row was taken from a packet capture of simulated normal DNS traffic, or from a packet capture of a particular type of simulated attack.
+
+Within BreakingPoint, Transaction IDs are represented as a decimal number. However, tshark dissect the transaction id in its hexadecimal representation in the format contained within parenthesis below.
+
+Within Apache Spot only responses from DNS servers are ingested (since the response packet contains the query made by the client and the response from the server in the same packet)
+
+
+| Super Flow Name           | Transaction IDs    | Description |
+|---------------------------|:------------------:|-------------|
+| Brandon_DNS_domain_Test   | 1008 (0x000003f0)  | [Normal] This super flow simulates normal DNS queries distributed over time and IP address within the network.|
+| DNS_Tunnel_BE_1           | 1002 (0x000003ea)  | [Attack] This super flow simulates a message being tunneled over DNS via the query name field (url's are random strings), with a ip address response (drawn from a file of randomly generated IPs) being sent via the DNS answer field. |
+| DNS_Tunnel_BE_2           | 1003 (0x000003eb})  | [Attack] This super flow simulates a message being tunneled over DNS via the query name field (url's random strings), with a response being given as no such url found. |
+| TCP_DNS_Tunnel_BE_1       | 1001 (0x000003e9)  | [Attack] This super Flow emulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using hex0x20Hack encoding. |
+| TCP_DNS_Tunnel_BE_2       | 1005 (0x000003ed)  | [Attack] This super Flow emulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base16Alpha encoding. |
+| TCP_DNS_Tunnel_BE_3       | 1007 (0x000003ef)  | [Attack] This super Flow emulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base63 encoding. |
+
+## Data Sets
+
+| Simulation Date   | Type  | Location  | Size  | Additional Comments   |
+|-------------------|:------|:---------:|:-----:|:---------------------:|
+| May 9, 2017       | Tarball of pcap files | | | |
+| May 9, 2017       | Tarball of ingested data (parquet format) | | | |
+| July 20, 2017     | Tarball of pcap files | | | | 
+| July 20, 2017     | Tarball of ingested data (parquet format) | | | |)
+
+
+
+| Simulation Date  | Total Records  | dns_id=1008 | dns_id=1002 | dns_id=1003 | dns_id=1001 | dns_id=1005 | dns_id=1007 |
+|:-----------------|:--------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
+| 5/9/2017         | 391,364,387    | 391,314,477 | 16,317      | 21,666      | 4,156       | 2,743       | 5, 028      |
+| 7/20/2017        | 406,050,508    | 406,043,921 | 856         | 1,269       | 1,167       | 1,694       | 1,601       |
+
+
+More to do here?
+
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-spot/blob/94a39a24/spot-ml/DATA_SAMPLES.md
----------------------------------------------------------------------
diff --git a/spot-ml/DATA_SAMPLES.md b/spot-ml/DATA_SAMPLES.md
deleted file mode 100644
index 0698bd1..0000000
--- a/spot-ml/DATA_SAMPLES.md
+++ /dev/null
@@ -1,68 +0,0 @@
-
-# DNS Labeled Data Sets
-
-An IXIA BreakingPoint box to simulate both normal and attack (DNS tunnelling) DNS traffic. The resulting pcaps were obtained and fields relevant to Spot injested (both original pcaps and injested parquet files are available in Amazon-S3). The attacks and the normal activity can be differentiated due to codes that were inserted into the Transaction ID field(upon ingestion the field is: ‘dns_id’) which identifies either the fact that the traffic was normal or identifies the specific dns tunneling activity being used.  We provide the schema for the injested pcap data, location and specifications of the data within Amazon-S3, and how to interpret the ‘dns_id’ codes.
-
-Spot (using version #####fill in here###.) was run on these datasets with ten repetitions each.  We provide the Area Under the Curve (AUC) value related to how well the attacks were detected. We also provide the rank distributions for the various attacks within the dataset, with a rank of 1 meaning the entry was found to be the most suspicious entry out of all other entries.
-
-
-## Schema For Ingested Data (same for both data sets)
-
-The schema for this DNS data has one additional field, ‘dns_id’, over what is usually used for DNS data in Spot. The schema is as follows:
-
-
-| Name         | Type      |
-|--------------|:---------:|
-| frame_time   | string    |
-| unix_tstamp  | bigint    |
-| frame_len    | int       |
-| ip_dst       | string    |
-| ip_src       | string    |
-| dns_qry_name | string    |
-| dns_qry_class| string    |
-| dns_qry_type | int       |
-| dns_qry_rcode| int       |
-| dns_a        | string    |
-| dns_id       | string    |
-
-## Transaction ID Interpretations (same for both data sets)
-The following provides interpretations for the values of the transaction ID field, ‘dns_id’. Each value indicates that either the data row was taken from a packet capture of simulated normal DNS traffic, or from a packet capture of a particular type of simulated attack.
-
-Within BreakingPoint, Transaction IDs are represented as a decimal number. However, tshark dissect the transaction id in its hexadecimal representation in the format contained within parenthesis below.
-
-Within Apache Spot only responses from DNS servers are ingested (since the response packet contains the query made by the client and the response from the server in the same packet)
-
-
-| Super Flow Name           | Transaction IDs    | Description |
-|---------------------------|:------------------:|-------------|
-| Brandon_DNS_domain_Test   | 1008 (0x000003f0)  | [Normal] This super flow simulates normal DNS queries distributed over time and IP address within the network.|
-| DNS_Tunnel_BE_1           | 1002 (0x000003ea)  | [Attack] This super flow simulates a message being tunneled over DNS via the query name field (url's are random strings), with a ip address response (drawn from a file of randomly generated IPs) being sent via the DNS answer field. |
-| DNS_Tunnel_BE_2           | 1003 (0x000003eb})  | [Attack] This super flow simulates a message being tunneled over DNS via the query name field (url's random strings), with a response being given as no such url found. |
-| TCP_DNS_Tunnel_BE_1       | 1001 (0x000003e9)  | [Attack] This super Flow emulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using hex0x20Hack encoding. |
-| TCP_DNS_Tunnel_BE_2       | 1005 (0x000003ed)  | [Attack] This super Flow emulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base16Alpha encoding. |
-| TCP_DNS_Tunnel_BE_3       | 1007 (0x000003ef)  | [Attack] This super Flow emulates tunneling random noise using TCP over DNS. The payload is generated by a Markov Dictionary and encoded in the DNS requests (responses) by using Base63 encoding. |
-
-## Data Sets
-
-| Simulation Date   | Type  | Location  | Size  | Additional Comments   |
-|-------------------|:------|:---------:|:-----:|:---------------------:|
-| May 9, 2017       | Tarball of pcap files | | | |
-| May 9, 2017       | Tarball of ingested data (parquet format) | | | |
-| July 20, 2017     | Tarball of pcap files | | | | 
-| July 20, 2017     | Tarball of ingested data (parquet format) | | | |)
-
-
-
-| Simulation Date  | Total Records  | dns_id=1008 | dns_id=1002 | dns_id=1003 | dns_id=1001 | dns_id=1005 | dns_id=1007 |
-|:-----------------|:--------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
-| 5/9/2017         | 391,364,387    | 391,314,477 | 16,317      | 21,666      | 4,156       | 2,743       | 5, 028      |
-| 7/20/2017        | 406,050,508    | 406,043,921 | 856         | 1,269       | 1,167       | 1,694       | 1,601       |
-
-
-More to do here?
-
-
-
-
-
-