You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spot.apache.org by solrac901 <gi...@git.apache.org> on 2017/01/10 00:43:25 UTC

[GitHub] incubator-spot pull request #7: Open Data Model

GitHub user solrac901 opened a pull request:

    https://github.com/apache/incubator-spot/pull/7

    Open Data Model

    Version 1.0 Open Data Model documentation
    Adding all the documentation of the ODM

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/solrac901/incubator-spot master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spot/pull/7.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7
    
----
commit 89d0c29021f545618c2648f3caeb962a08eb4115
Author: solrac901 <ca...@intel.com>
Date:   2017-01-10T00:35:23Z

    Open Data Model
    
    Version 1.0 Open Data Model documentation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-spot pull request #7: Open Data Model

Posted by markgrover <gi...@git.apache.org>.
Github user markgrover commented on a diff in the pull request:

    https://github.com/apache/incubator-spot/pull/7#discussion_r95462618
  
    --- Diff: docs/Open Data Model/Open Data Model.md ---
    @@ -0,0 +1,755 @@
    +Overview............................................................................ 2
    +
    +Apache Spot Open Data Model Strategy.................................................................................... 2
    +
    +Apache Spot Enabled Use Cases................................................................................................... 3
    +
    +Data Model..................................................................................................................................... 4
    +
    +Naming Convention....................................................................................................................... 5
    +
    +Prefixes......................................................................................................................................................... 5
    +
    +Security Event Log/Alert Data Model.......................................................................................... 6
    +
    +Common.......................................................................................................................................................... 7
    +
    +Network........................................................................................................................................................... 9
    +
    +File................................................................................................................................................................ 10
    +
    +Endpoint........................................................................................................................................................ 11
    +
    +User............................................................................................................................................................... 11
    +
    +DNS............................................................................................................................................................. 11
    +
    +Proxy......................................................................................................................................................... 12
    +
    +HTTP.............................................................................................................................................................. 13
    +
    +SMTP............................................................................................................................................................ 14
    +
    +FTP............................................................................................................................................................. 15
    +
    +SNMP.................................................................................................................................................... 16
    +
    +TLS........................................................................................................................................................... 16
    +
    +SSH............................................................................................................................................................... 17
    +
    +DHCP............................................................................................................................................................. 17
    +
    +IRC................................................................................................................................................................ 17
    +
    +Flow............................................................................................................................................................ 17
    +
    +Context Models............................................................................................................................ 18
    +
    +User Context Model....................................................................................................................... 18
    +
    +Endpoint Context Model.......................................................................................................................... 20
    +
    +Network Context Model............................................................................................................................22
    +
    +Extensibility of Data Model......................................................................................................... 23
    +
    +Model Relationships.................................................................................................................... 24
    +
    +Data Ingestion Framework.......................................................................................................... 24
    +
    +Data Formats................................................................................................................................ 25
    +
    +Avro............................................................................................................................................................... 25
    +
    +JSON...................................................................................................................................................... 27
    +
    +Parquet................................................................................................................................................... 27
    +
    +ODM Resultant Capability - A Singular View............................................................................ 28
    +
    +**Example - Advanced Threat Modeling**................................................................................................... 28
    +
    +**Example - Singular Data View for Complete Context**................................................................. 29
    +
    +
    +
    +**Overview**
    +----
    +
    +This document describes a strategy for creating an open data model (ODM) for Apache Spot (incubating) (formerly known as \u201cOpen Network Insight (ONI)\u201d) in support of cyber security analytic use cases. It also describes the use cases for which Apache Spot (incubating) running on the Cloudera platform is uniquely capable of addressing along with the data model.
    +
    +
    +
    +**Apache Spot (incubating) Open Data Model Strategy**
    +------------------------------------
    +
    +The Apache Spot (incubating) Open Data Model (ODM) strategy aims to extend Apache Spot (incubating) capabilities to support a broader set of cyber security use cases than initially supported. The primary use case initially supported by Apache Spot (incubating) includes Network Traffic Analysis for network flows (Netflow, sflow, etc.), DNS and Proxy; primarily the identification of threats through anomalous event detection using both supervised and unsupervised machine learning.
    +
    +In order to support a broader set of use cases, Spot must be extended to collect and analyze other common
    +\u201cevent-oriented\u201d data sources analyzed for cyber threats, including but not limited to the following log types:
    +
    +> \u25cfProxy
    +> 
    +> \u25cfWeb server
    +> 
    +> \u25cfOperating system
    +> 
    +> \u25cfFirewall
    +> 
    +> \u25cfIntrusion Prevention/Detection (IDS/ IPS)
    +> 
    +> \u25cfData Loss Prevention
    +> 
    +> \u25cfActive Directory / Identity Management
    +> 
    +> \u25cfUser/Entity Behavior Analysis
    +> 
    +> \u25cfEndpoint Protection/Asset Management
    +> 
    +> \u25cfNetwork Metadata/Session and PCAP files
    +> 
    +> \u25cfNetwork Access Control
    +> 
    +> \u25cfMail
    +> 
    +> \u25cfVPN
    +> 
    +> \u25cf etc..
    +
    +One of the biggest challenges organizations face today in combating cyber threats is collecting and normalizing data from the myriad of security event data sources (hundreds) in order to build the needed analytics. This often results in the analytics being dependent upon the specific technologies used by an organization to detect threats and prevents the needed flexibility and agility to keep up with these ever-increasing (and complex) threats.&nbsp; Technology lock-in is sometimes a byproduct of today\u2019s status quo, as it\u2019s extremely costly to add new technologies (or replace existing ones) because of the downstream analytic dependencies.
    +
    +To achieve the goal of extending Apache Spot (incubating) to support additional use cases, it is necessary to create an open data model for the most relevant security event and contextual data sources; Security event logs or alerts, Network context, User details and information that comes from the endpoints or any other console that are being use to manage the security / administration of our endpoints. The presence of an open data model, which can be applied \u201con-read\u201d or \u201con-write\u201d, in batch or stream, will allow for the separation of security analytics from the specific data sources on which they are built. This \u201cseparation of duties\u201d will enable organizations to build analytics that are not dependent upon specific technologies and provide the flexibility to change underlying data sources and also provide segmentation of this information, without impacting the analytics. This will also afford security vendors the opportunity to build additional products on top of t
 he Open Data Model to drive new revenue streams and also to design new ways to detect threats and APT.
    +
    +
    +**Apache Spot (incubating) Enabled**
    +----
    +
    +**Use Cases**
    +-------------
    +
    +Spot on the Cloudera platform is uniquely positioned to help address the following cyber security use cases,
    +which are not effectively addressed by legacy technologies:
    +
    + 
    +
    + **- Detection of known & unknown threats leveraging machine learning and advanced analytic modeling**
    +
    +Current technologies are limited in the analytics they can apply to detect threats. These limitations stem from the inability to collect all the data sources needed to effectively identify threats (structured, unstructured, etc.) and inability to process the massive volumes of data needed to do so (billions of events per day). Legacy technologies are typically focus and limited to rules-based and signature detection. They are somewhat \u201ceffective\u201d at detecting known threats but struggle with new threats.
    +
    +Spot addresses these gaps through its ability to collect any data type of any volume. Coupled with the various analytic frameworks that are provided (including machine learning), Spot enables a whole new class of analytics that can scale to today\u2019s demands. The topic model used by Spot to detect anomalous network traffic is one example of where the Spot platform excels.
    +
    + **- Reduction of mean time to incident detection & resolution (MTTR)**
    +
    +One of the challenges organizations face today is detecting threats early enough to minimize adverse impacts. This stems from the limitations previously discussed with regards to limited analytics. It can also be attributed to the fact that most of the investigative queries often take hours or days to return results. Legacy technologies can\u2019t offer or have a central data store for facilitating such investigations due to their inability to store and serve the massive amounts of data involved. This cripples incident investigations and results in MTTRs of many weeks or months, meanwhile the adverse impacts of the breach are magnified, thus making the threat harder to eradicate.
    +
    +Apache Spot (incubating) addresses these gaps by providing the capability for a central data store that houses ALL the data needed to facilitate an investigation, returning investigative query results in seconds and minutes (vs. hours and days). Spot can effectively reduce incident MTTR and reduce adverse impacts of a breach.
    +
    + **- Threat Hunting**
    +
    +It\u2019s become necessary for organizations to \u201chunt\u201d for active threats as traditional passive threat detection approaches are not sufficient. \u201cHunting\u201d involves performing ad-hoc searches and queries over vast amounts of data representing many weeks and months\u2019 worth of events, as well as applying ad-hoc / tune algorithms to detect the needle in the haystack. Traditional systems do not perform well for these types of activities as the query results sometimes take hours and days to be retrieved. These traditional systems also lack the analytic flexibility to construct the necessary algorithms and logic needed.
    +
    +Apache Spot (incubating) addresses these gaps in the same ways it addresses others; by providing a central data store with the needed analytic frameworks that scale to the needed workloads.
    +
    +**Data Model**
    +----------
    +In order to provide a framework for effectively analyzing data for cyber threats, it is necessary to collect and
    +analyze standard security event logs/alerts and contextual data regarding the entities referenced in these logs/alerts. The most common entities include network, user and endpoint, but there are others such as file.
    +
    +In the diagram below, the raw event tells us that user \u201cjsmith\u201d successfully logged in to an Oracle database from the IP address 10:1.1.3. Based on the raw event only, we don\u2019t know if this event is a legitimate threat or not. After injecting user and endpoint context, the enriched event tells us this event is a potential threat that requires further investigation.
    +
    +![Screen Shot 2016-09-22 at 1.11.28 PM.png](CybersecurityOpenDataModel0%204-3_files/image001.jpg)
    +
    +Based on the need to collect and analyze both security events, logs or alerts and contextual data, support for
    +the following types of security information are planned for inclusion in the Spot Open Data Model:
    +
    + - Security event logs/alerts
    +This data type includes event logs from common data sources used to detect threats and includes network flows, operating system logs, IPS/IDS logs, firewall logs, proxy logs, web logs, DLP logs, etc.
    +
    + - Network context data
    +This data type includes information about the network, which can be gleaned from Whois servers, asset databases and other similar data sources.
    +
    + - User context data
    +This data type includes information from user and identity management systems including Active Directory, Centrify, and other identity and access management systems.
    +
    + - Endpoint context data
    +This data includes information about endpoint systems (servers, workstations, routers, switches, etc.) and can be sourced from asset management systems, vulnerability scanners, and endpoint  management/detection/response systems such as Webroot, Tanium, Sophos, Endgame, CarbonBlack, Intel Security ePO and others.
    +
    + - File context data** (ROADMAP ITEM)**
    +This data includes contextual information about files and can be sourced from systems such as FireEye, Application Control and others.
    +
    + - Threat intelligence context data **(ROADMAP ITEM)**
    +This data includes contextual information about URLs, domains, websites, files and others.
    +
    +**Naming Convention**
    +-----------------
    +
    +A naming convention is needed for the Open Data Model to represent common attributes across vendor products and technologies. The naming convention is described below.
    +
    +**Prefixes**
    +--------
    +
    +|  Prefix | Description  |  
    +|---|---|
    +|  src | Corresponds to the \u201csource\u201d fields within a given event (i.e. source address)|  
    +|  dst | Corresponds to the \u201cdestination\u201d fields within a given event (i.e. destination address) |  
    +|  dvc | Corresponds to the \u201cdevice\u201d applicable fields within a given event (i.e. device address) and represent where the event originated  |  
    +| fwd  | Forwarded from device   |  
    +| request | Corresponds to requested values (vs. those returned, i.e. \u201crequested URI\u201d) |  
    +| response  | Corresponds to response value (vs. those requested) |  
    +| file  |  Corresponds to the \u201cfile\u201d fields within a given event (i.e. file type) |  
    +| user  | Corresponds to user attributes (i.e. name, id, etc.)  |  
    +| xlate  | Corresponds to translated values within a given event (i.e. src_xlate_ip for \u201ctranslated source ip address\u201d |  
    +| in  | Ingress|  
    +| out | Egress |  
    +| new | New value |  
    +| orig | Original value |  
    +| app | Corresponds to values associated with application events |  
    +
    +
    +**Security Event Log/Alert Data Model**
    +-----------------------------------
    +
    +The data model for security event logs/alerts is detailed in the below. The attributes are categorized as follows:
    +
    + - Common -attributes that are common across many device types
    + - Device -attributes that are applicable to the device that generated the event
    + - File -attributes that are applicable to file objects referenced in the event
    + - Endpoint -attributes that are applicable to the endpoints referenced in the event
    + - User- attributes that are applicable to the user referenced in the event
    + - Proxy - attributes that are applicable to proxy events
    + - Protocol
    +
    +> DNS - attributes that are specific to DNS events
    +> HTTP - attributes that are specific to HTTP events
    +> SMTP, SSH, TLS, DHCP, IRC, SNMP and FTP
    +
    +Note: The model will evolve to include reserved attributes for additional device types that are not currently represented. The model can currently be extended to support ANY attribute for ANY device type by following the guidance outlined in the section titled **\u201cExtensibility of Data Model\u201d.**
    +
    +Note: Attributes denoted in BLUE represent those that are listed in the model multiple times for the purpose of
    +demonstrating attribute coverage for a particular entity (endpoint, user, network, etc.) or log type (Proxy, DNS, etc.).
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|***Common***|eventtime|long|timestamp of event (UTC)|1472653952|
    +||duration|int|Time duration (milliseconds)|2345|
    +||eventid|string|Unique identifier for event|x:2388|
    +||org|string|Organization|\u201cHR\u201d or \u201cFinance\u201d or \u201cCustomerA\u201d
    +||type|string|Type information |\u201cInformational\u201d, \u201cimage/gif\u201d
    +||nproto|string|Network protocol of event |TCP, UDP, ICMP
    +||aproto|string|Application protocol of event |HTTP, NFS, FTP
    +||msg|string|Message (details of action taken on object)|Some long string
    +||mac|string|MAC address|94:94:26:3:86:16
    +||severity|string|Severity of event|High, 10, 1
    +||raw|string|Raw text message of entire event|Complete copy of log entry
    +||risk|Floating point|Risk score|95.67
    +||code|string|Response or error code|404
    +||category|string|Event category|/Application/Start
    +||qry|string|Query (DNS query, URI query,  SQL query, etc.)|Select * from "table"
    +||service|string|(i.e. service name, type of service)|sshd
    +||state|string|State of object|Running, Paused, stopped
    +||in_bytes|int|Bytes in|1025
    +||out_bytes|int|Bytes out|9344
    +||additional_attrs|String (JSON Map)|Custom event attributes|"building":"729","cube":"401"|
    +||dvc_time|long|UTC timestamp from device where event/alert originates or is received|1472653952|
    +||dvc_ip4/dvc_ip6|long|IP address of device|Integer representaion of 10.1.1.1|
    +||dvc_host|string|Hostname of device|Integer representaion of 10.1.1.1|
    +||dvc_type|string|Device type that generated the log|Unix, Windows, Sonicwall|
    +||dvc_vendor|string|Vendor|Microsoft, Fireeye, Intel Security|
    +||dvc_version|string|Version |5.4|
    +||fwd_ip4/fwd_ip6|long|Forwarded from device|Integer representation of 10.1.1.1|
    +||version|string|Version|\u201c3.2.2\u201d|
    +
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**Network**|src_ip4/src_ip6|bigint|Source ip address of event|Integer representation of 10.1.1.1
    +||src_host|string|Source FQDN of event|test.companyA.com
    +||src_domain|string|Domain name of source address|companyA.com
    +||src_port|int|Source port of event|1025
    +||src_country_code|string|Source country code|cn
    +||src_country_name|string|Source country name|China
    +||src_region|string|Source region|string
    +||src_city|string|Source city|Shenghai
    +||src_lat|int|Source latitude|
    +||src_long|int|Source longitude|
    +||dst_ip4/dst_ip6|bigint|Destination ip address of event|Integer representaion of 10.1.1.1
    +||dst_host|string|Destination FQDN of event|test.companyA.com
    +||dst_domain|string|Domain name of destination address|companyA.com
    +||dst_port|int|Destination port of event|80
    +||dst_country_code|string|Source country code|cn
    +||dst_country_name|string|Source country name|China
    +||dst_region|string|Source region|string
    +||dst_city|string|Source city|Shenghai
    +||dst_lat|int|Source latitude|
    +||dst_long|int|Source longitude|
    +||asn|int|Autonomous system number|33
    +||in_bytes|int|Bytes in|987
    +||out_bytes|int|Bytes out|1222
    +||direction|string|Direction|In, inbound, outbound, ingress, egress
    +||flags|string|TCP flags|.AP.SF
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**File**|file_name|string|Filename from event|output.csv
    +||file_path|string|File path|/root/output.csv
    +||file_atime|bigint|Timestamp (UTC) of file access|1472653952
    +||file_acls|string|File permissions|rwx-rwx-rwx
    +||file_type|string|Type of file|\u201c.doc\u201d
    +||file_size|int|Size of file in bytes|1244
    +||file_desc|string|Description of file|Project Plan for Project xyz
    +||file_hash|string|Hash of file|
    +||file_hash_type|string|Type of hash|MD5, SHA1,SHA256
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**Endpoint**|object|string|File/Process/Registry|File, Registry, Process
    +||action|string|Action taken on object (open/delete/edit)|Open, Edit
    +||msg|string|Message (details of action taken on object)|Some long string
    +||app|string|Application|Microsoft Powerpoint
    +||location|string|Location|Atlanta, GA
    +||proc|string|Process|SSHD
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**User**|user_name|string|username from event|mhicks
    +||email|string|Email address|test@companyA.com
    +||user_id|string|userid|234456
    +||user_loc|string|location|Herndon, VA
    +||user_desc|string|Description of user|
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|DNS|dns_class|string|DNS class|1
    +||dns_length|int|DNS frame length|188
    +||dns_qry|string|Requested DNS query|test.test.com
    +||dns_code|string|Response code|0x00000001
    +||dns_response_qry|string|Response to DNS Query|178.2.1.99
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|Proxy|category|string|Event category|SG-HTTP-SERVICE
    +||browser|string|Web browser|Internet Explorer
    +||code|string|Error or response code|404
    +||in_bytes|int|Bytes in|1025
    +||out_bytes|int|Bytes out|1288
    +||referrer|string|Referrer|www.usatoday.com
    +||request_uri|string|Requested URI|/wcm/assets/images/imagefileicon.gif
    +||filter_rule|string|Applied filter or rule|Internet, Rule 6 
    +||filter_result|string|Result of applied filter or rule|Proxied, Blocked
    +||qry|string|URI query|?func=S_senseHTML&Page=a26815a313504697a126279
    +||action|string|Action taken on object |TCP_HIT, TCP_MISS, TCP_TUNNELED
    +||method|string|HTTP method|GET, CONNECT, POST
    +||type|string|Type of request|image/gif
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|HTTP|request_method|string|HTTP method|GET, CONNECT, POST
    +||request_uri |string|Requested URI|/wcm/assets/images/imagefileicon.gif
    +||request_body_len|int|Length of request body|98
    +||request_user_name |string|username from event|mhicks
    +||request_password|string|Password from event|abc123
    +||request_proxied|string||
    +||request_headers|MAP|HTTP request headers|request_headers[\u2018HOST\u2019] request_headers[\u2018USER-AGENT\u2019] request_headers[\u2018ACCEPT\u2019]
    +||response_status_code|int|HTTP response status code|404
    +||response_status_msg|string|HTTP response status message|\u201cNot found\u201d
    +||response_body_len|int|Length of response body|98
    +||response_info_code |int|HTTP response info code|100
    +||response_info_msg|string|HTTP response info message|\u201cSome string\u201d
    +||response_resp_fuids|string|Response FUIDS|
    +||response_mime_types|string|Mime types|\u201ccgi,bat,exe\u201d
    +||response_headers|MAP|Response headers|response_headers[\u2018SERVER\u2019] response_headers[\u2018SET-COOKIE\u2019\u2019] response_headers[\u2018DATE\u2019]
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**SMTP**|trans_depth|int|Depth of email into SMTP exchange|Coming soon
    +||headers_helo|string|Helo header|Coming soon
    +||headers_mailfrom|string|Mailfrom header|Coming soon
    +||headers_rcptto|string|Rcptto header|Coming soon
    +||headers_date|string|Header date|Coming soon
    +||headers_from|string|From header|Coming soon
    +||headers_to|string|To header|Coming soon
    +||headers_reply_to|string|Reply to header|Coming soon
    +||headers_msg_id|string|Message ID |Coming soon
    +||headers_in_reply_to|string|In reply to header|Coming soon
    +||headers_subject|string|Subject|Coming soon
    +||headers_x_originating_ip4|bigint|Originating IP address|Coming soon
    +||headers_first_received|string|First to receive message|Coming soon
    +||headers_second_received|string|Second to receive message|Coming soon
    +||last_reply|string|Last reply in message chain|Coming soon
    +||path|string|Path of message|Coming soon
    +||user_agent|string|User agent|Coming soon
    +||tls|boolean|Indication of TLS use|Coming soon
    +||is_webmail|boolean|Indication of webmail|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**FTP**|user_name|string|Username|Coming soon
    +||password|string|Password|Coming soon
    +||command|string|FTP command|Coming soon
    +||arg|string|Argument|Coming soon
    +||mime_type|string|Mime type|Coming soon
    +||file_size|int|File size|Coming soon
    +||reply_code|int|Reply code|Coming soon
    +||reply_msg|string|Reply message|Coming soon
    +||data_channel_passive|boolean|Passive data channel?|Coming soon
    +||data_channel_rsp_p|string||Coming soon
    +||cwd|string|Current working directory|Coming soon
    +||cmdarg_ts|float||Coming soon
    +||cmdarg_cmd|string|Command|Coming soon
    +|cmdarg_arg|string|Command argument|Coming soon
    +||cmdarg_seq|int|Sequence|Coming soon
    +||pending_commands|string|Pending commands|Coming soon
    +||is_passive|boolean|Passive mode enabled|Coming soon
    +||fuid|string|Coming soon|Coming soon
    +||last_auth_requested|string|Coming soon|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**SNMP**|version|string|Coming soon|Coming soon
    +||community|string|Coming soon|Coming soon
    +||get_requests|int|Coming soon|Coming soon
    +||get_bulk_requests|int|Coming soon|Coming soon
    +||get_responses|int|Coming soon|Coming soon
    +||set_requests|int|Coming soon|Coming soon
    +||display_string|string|Coming soon|Coming soon
    +||up_since|float|Coming soon|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**TLS**|version|string|Coming soon|Coming soon
    +||cipher|string|Coming soon|Coming soon
    +||curve|string|Coming soon|Coming soon
    +||server_name|string|Coming soon|Coming soon
    +||resumed|boolean|Coming soon|Coming soon
    +||next_protocol|string|Coming soon|Coming soon
    +||established|boolean|Coming soon|Coming soon
    +||cert_chain_fuids|string|Coming soon|Coming soon
    +||client_cert_chain_fuids|string|Coming soon|Coming soon
    +||subject|string|Coming soon|Coming soon
    +||issuer|string|Coming soon|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**SSH**|version|string|Coming soon|Coming soon
    +||auth_success|boolean|Coming soon|Coming soon
    +||client|string|Coming soon|Coming soon
    +||server|string|Coming soon|Coming soon
    +||cipher_algorithm|string|Coming soon|Coming soon
    +||mac_algorithm|string|Coming soon|Coming soon
    +||compression_algorithm|string|Coming soon|Coming soon
    +||key_exchange_algorithm|string|Coming soon|Coming soon
    +||host_key_algorithm|string|Coming soon|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**DHCP**|assigned_ip4|bigint|Coming soon|Coming soon
    +||mac|string|Coming soon|Coming soon
    +||lease_time|double|Coming soon|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**IRC**|user|string|Coming soon|Coming soon
    +||nickname|string|Coming soon|Coming soon
    +||command|string|Coming soon|Coming soon
    +||value|string|Coming soon|Coming soon
    +||additional_data|string|Coming soon|Coming soon
    +
    +|**Category**|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|---|
    +|**Flow**|in_packets|int|Coming soon|Coming soon
    +||out_packets|int|Coming soon|Coming soon
    +||in_bytes|int|Coming soon|Coming soon
    +||out_bytes|int|Coming soon|Coming soon
    +||conn_state|string|Coming soon|Coming soon
    +||history|string|Coming soon|Coming soon
    +||duration|float|Coming soon|Coming soon
    +||src_os|string|Coming soon|Coming soon
    +||dst_os|string|Coming soon|Coming soon
    +
    +Note: It is not necessary to populate all of the attributes within the model.  For attributes not populated in a single security event log/alert, contextual data may not be available. For example, the sample event below can be enriched with contextual data about the referenced endpoints (10.1.1.1 and 192.168.10.10), but not a user, because username is not populated.
    +
    +> **date,time,source_ip,source_port,protocol,destination_ip,destination_port,bytes**
    +12/12/2015,23:14:56,10.1.1.1,1025,tcp,192.168.10.10,443,1183
    +
    +
    +**Context Models**
    +==================
    +The recommended approach for populating the context models (user, endpoint, network, etc.) involves consuming information from the systems most capable or providing the needed context.  Populating the user context model is best accomplished by leveraging user/identity management systems such as Active Directory or Centrify and populating the model with details such as the user\u2019s full name, job title, phone number, manager\u2019s name, physical address, entitlements, etc.  Similarly, an endpoint model can be populated by consuming information from endpoint/asset management systems (Tanium, Webroot, etc.), which provide information such as the services running on the system, system owner, business context, etc.
    +
    +**User Context Model**
    +------------------
    +The data model for user context information is as follows:
    +
    +|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|
    +|dvc_time|bigint|Timestamp from when the user context information is obtained|1472653952
    +|created|bigint|Timestamp from when user was created|1472653952
    +|Changed\u2013\u2013\u2013\u2013|bigint|Timestamp from when user was updated|1472653952
    +|lastlogon|bigint|Timestamp from when user last logged on|1472653952
    +|logoncount|int|Number of times account has logged on|232
    +|lastreset|bigint|Timestamp from when user last reset passwod|1472653952
    +|expiration|bigint|Date/time when user expires|1472653952
    +|userid|string|Unique user id|1234
    +|username|string|Username in event log/alert|jsmith
    +|name_first|string|First name|John
    +|name_middle|string|Middle name|Henry
    +|name_last|string|Last name|Smith
    +|name_mgr|string|Manager\u2019s name|Ronald Reagan
    +|phone|string|Phone number|703-555-1212
    +|email|string|Email address|jsmith@company.com
    +|code|string|Job code|3455
    +|loc|string|Location|US
    +|departm|string|Department|IT
    +|dn||Distinguished name|"CN=scm-admin-mej-test2-adk,OU=app-|admins,DC=ad,DC=halxg,DC=companya,DC=com"
    +|ou|string|Organizational unit|EAST
    +|empid|string|Employee ID|12345
    +|title|string|Job Title|Director of IT
    +|groups|string (comma separated list, no spaces after comma)|Groups to which the user belongs|\u201cDomain Admins\u201d, \u201cDomain Users\u201d
    +|dvc_type|string|Device type that generated the user context data|Active Directory
    +|dvc_vendor|string|Vendor|Microsoft
    +|dvc_version|string|Version |8.1.2
    +|additional_attrs|string|Additional attributes of user|Key value pairs
    +
    +
    +**Endpoint Context Model**
    +------------------
    +The data model for endpoint context information is as follows:
    +|**Abbreviation**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|
    +|dvc_time|bigint|Timestamp from when the endpoint context information is obtained|1472653952
    +|ip4|bigint|IP address of endpoint |Integer representaion of 10.1.1.1
    +|ip6|bigint|IP address of endpoint |Integer representaion of 10.1.1.1
    +|os|string|Operating system|Redhat Linux 6.5.1
    +|os_version|string|Version of OS|5.4
    +|os_sp|string|Service pack|SP 2.3.4.55
    +|tz|string|timezone|EST
    +|hotfixes|string|Applied hotfixes|993.2
    +|disks|string|Available disks|\\Device\\HarddiskVolume1, \\Device\\HarddiskVolume2
    +|removables|string|Removable media devices|USB Key
    +|nics|string|Network interfaces|fe10::28f4:1a47:658b:d6e8, fe82::28f4:1a47:658b:d6e8 
    +|drivers|string|Installed kernel drivers|ntoskrnl.exe, hal.dll
    +|users|string|Local user accounts|administrator, jsmith
    +|host|string|Hostname of endpoint|tes1.companya.com
    +|mac|string|MAC address of endpoint|fe10::28f4:1a47:658b:d6e8
    +|owner|string|Endpoint owner (name)|John Smith
    +|vulns|string (comma separated, no spaces after commas)|Vulnerability identifiers (CVE identifier)|CVE-123, CVE-456
    +|loc|string|Location|US
    +|departm|string|Department name|IT
    +|company|string|Company name|CompanyA
    +|regs|string (comma-separated)|Applicable regulations|HIPAA, SOX
    +|svcs|string (comma-separated)|Services running on system|Cisco Systems, Inc. VPN Service, Adobe LM Service
    +|procs|string|Processes|svchost.exe, sppsvc.exe
    +|criticality|string|Criticality of device|Very High
    +|apps|string (comma-separated)|Applications running on system|Microsoft Word, Chrome
    +|desc|string|Endpoint descriptor|Some string
    +|dvc_type|string|Device type that generated the log|Microsoft Windows 7
    +|dvc_vendor|string|Vendor|Endgame
    +|dvc_version|string|Version |2.1
    +|architecture|string|CPU architecture|x86
    +|uuid|string|Universally unique identifier|a59ba71e-18b0-f762-2f02-0deaf95076c6
    +|memtotal|int|Total memory (bytes)|844564433
    +|additional_attrs|string|Additional attributes|Key value pairs
    +
    +**VPN Context Model**
    +------------------
    +The data model for VPN context information is based on the VPN logs as follows:
    +
    +|**Abbreviation**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|
    +|dvc_time|bigint|Timestamp from when the endpoint context information is obtained|1472653952
    +|ip4|bigint|IP address of VPN box|Integer representaion of 10.1.1.1
    +|ip6|bigint|IP address of VPN box |Integer representaion of 10.1.1.1
    +|vpn_vendor|string|Vendor VPN|Cisco
    +|vpn_version|string|Version VPN|3.0
    +|vpn_sp|string|VPN Service pack|5
    +|tz|string|VPN  timezone|EST
    +|vpn_hotfixes|string|VPN Applied hotfixes|1134
    +|vpn_nics|string|Network interfaces|fe10::28f4:1a47:658b:d6e8, fe82::28f4:1a47:658b:d6e8 
    +|vpn_host|VPN Country Code|string|MX
    +|vpn_country_name|VPN Country Name|string|Mexico
    +|vpn_ip||string|Integer representation of 10.1.1.2
    +|vpn_encrypt|VPN encryption protocol|string|IPSEC
    +|vpn_username|string|VPN user account|jsmith
    +|vpn_user_ip|string|VPN User IP address|Integer representation of 10.1.1.2
    +|vpn_user_cc|string|VPN Country Code|US
    +|vpn_user_cn|string|VPN Country Name|United States
    +|vpn_user_auth|string|VPN user authorization / role|Admin, normal user, etc
    +|vpn_account_vip|string|Criticality of the VPN account|Medium, High
    +|vpn_uuid|string|Universally unique identifier|a59ba71e-18b0-f762-2f02-0deaf95076c6
    +|uuids|string|Universally unique identifier(s) comes from thee endpoint context if match|a59ba71e-18b0-f762-2f02-0deaf95xmexzA
    +|additional_attrs|string|Additional attributes|Key value pairs
    +
    +**Network Context Model**
    +------------------
    +The data model for network context information is based on \u201cwhois\u201d information as follows:
    +
    +|**Attribute**|**Data Type**|**Description**|**Sample Values**|
    +|---|---|---|---|
    +|domain_name|string|Domain name
    +|registry_domain_id|string|Registry Domain ID
    +|registrar_whois_server|string|Registrar WHOIS Server
    +|registrar_url|string|Registrar URL
    +|update_date|bigint|UTC timestamp
    +|creation_date|bigint|Creation Date
    +|registrar_registration_expiration_date|bigint|Registrar Registration Expiration Date
    +|registrar|string|Registrar
    +|registrar_iana_id|string|Registrar IANA ID
    +|registrar_abuse_contact_email|string|Registrar Abuse Contact Email
    +|registrar_abuse_contact_phone|string|Registrar Abuse Contact Phone
    +|domain_status|string|Domain Status
    +|registry_registrant_id|string|Registry Registrant ID
    +|registrant_name|string|Registrant Name
    +|registrant_organization|string|Registrant Organization
    +|registrant_street|string|Registrant Street
    +|registrant_city|string|Registrant City
    +|registrant_state_province|string|Registrant State/Province
    +|registrant_postal_code|string|Registrant Postal Code
    +|registrant_country|string|Registrant Country
    +|registrant_phone|string|Registrant Phone
    +|registrant_email|string|Registrant Email
    +|registry_admin_id|string|Registry Admin ID
    +|name_server|string|Name Server
    +|dnssec|string|DNSSEC
    +
    +
    +**Extensibility of Data Model**
    +==================
    +
    +The aforementioned data model can be extended to accommodate custom attributes by embedding key-value pairs within the log/alert/context entries. 
    +Each model will support an additional attribute by the name of additional_attrs whose value would be a JSON string. This JSON string will contain a Map (and only a Map) of additional attributes that can\u2019t be expressed in the specified model description. Regardless of the type of these additional attributes, they will always be interpreted as String. It\u2019s up to the user, to translate them to appropriate types, if necessary, in the analytics layer. It is also the user\u2019s responsibility to populate the aforementioned attribute as a Map, by presumably parsing out these attributes from the original message.
    +For example, if a user wanted to extend the user context model to include a string attribute for \u201cDesk Location\u201d and \u201cCity\u201d, the following string would be set for additional_attrs:
    +
    +|**Attribute Key**|**Attribute Value**|
    +|---|---|
    +|additional_attrs|{"dsk_location":"B3-F2-W3", "city":"Palo Alto"}|
    +
    +
    +Something similar can be done for endpoint context model, security event log/alert model and other entities.
    +
    +	Note: This [UDF library](https://github.com/klout/brickhouse) can be used for converting to/from JSON.
    +
    +##**Model Relationships**##
    +
    +The relationships between the data model entities are illustrated below.
    +
    +Image here
    +
    +
    +##**Data Ingestion Framework**##
    +
    +One of the challenges in populating the data model is the large number of products and technologies that organizations are currently using to manage security event logs/alerts, user and endpoint information. There are literally dozens of vendors in each category that offer technologies that could be used to populate the model.  The labor required to transform the data and map the attributes to the data model is extensive when you consider how many technologies are in the mix at each organization (and across organizations). One way to address this challenge is with a Data Ingestion Framework that provides a configuration-based mechanism to perform the transformations and mappings.  A configuration-based capability will allow the ingest pipelines to become portable and reusable across the community. For example, if I create an ingest pipeline for Centrify to populate the user context model, it can be shared with other users of Centrify who can immediately realize the benefit.  Suc
 h a framework could allow the community to quickly build the necessary pipelines for the dozens (and hundreds) of technologies being used in the market.  Without a standard ingest framework, each pipeline is built independently, requiring more labor, providing no standardization and little portability.  It\u2019s also important that the data ingestion framework support the ability to both capture the \u201craw\u201d event and create a meta event that represents the normalized event and maps the attributes to the defined data model.  This will ensure both stream and batch processing use cases are supported.
    + 
    +Streamsets is an ingest framework that provides the needed functionality outlined above.  Sample Streamsets ingest pipelines for populating the ODM with common data sources will be published to the Spot Github repo.
    +
    +##**Data Formats**##
    +
    +**Avro**
    +----
    +
    +Avro is the recommended data format due to its schema representation, compatibility checks, and interoperability with Hadoop.  Avro supports a pure JSON representation for readability and ease of use but also a binary representation of the data for efficient storage.   Avro is the optimal format for streaming-based analytic use cases.
    +
    +A sample event and corresponding schema representation are detailed below.
    +
    +**Event**
    --- End diff --
    
    This may get compacted in one line so you have to put 3 backticks at top and bottom of this block. Same for Schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-spot issue #7: Open Data Model

Posted by markgrover <gi...@git.apache.org>.
Github user markgrover commented on the issue:

    https://github.com/apache/incubator-spot/pull/7
  
    Thanks for working on this @solrac901 I really appreciate all the time and effort you to put to convert this doc, I have left some comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-spot pull request #7: Open Data Model

Posted by markgrover <gi...@git.apache.org>.
Github user markgrover commented on a diff in the pull request:

    https://github.com/apache/incubator-spot/pull/7#discussion_r95460852
  
    --- Diff: docs/Open Data Model/Open Data Model.md ---
    @@ -0,0 +1,755 @@
    +Overview............................................................................ 2
    +
    +Apache Spot Open Data Model Strategy.................................................................................... 2
    +
    +Apache Spot Enabled Use Cases................................................................................................... 3
    +
    +Data Model..................................................................................................................................... 4
    +
    +Naming Convention....................................................................................................................... 5
    +
    +Prefixes......................................................................................................................................................... 5
    +
    +Security Event Log/Alert Data Model.......................................................................................... 6
    +
    +Common.......................................................................................................................................................... 7
    +
    +Network........................................................................................................................................................... 9
    +
    +File................................................................................................................................................................ 10
    +
    +Endpoint........................................................................................................................................................ 11
    +
    +User............................................................................................................................................................... 11
    +
    +DNS............................................................................................................................................................. 11
    +
    +Proxy......................................................................................................................................................... 12
    +
    +HTTP.............................................................................................................................................................. 13
    +
    +SMTP............................................................................................................................................................ 14
    +
    +FTP............................................................................................................................................................. 15
    +
    +SNMP.................................................................................................................................................... 16
    +
    +TLS........................................................................................................................................................... 16
    +
    +SSH............................................................................................................................................................... 17
    +
    +DHCP............................................................................................................................................................. 17
    +
    +IRC................................................................................................................................................................ 17
    +
    +Flow............................................................................................................................................................ 17
    +
    +Context Models............................................................................................................................ 18
    +
    +User Context Model....................................................................................................................... 18
    +
    +Endpoint Context Model.......................................................................................................................... 20
    +
    +Network Context Model............................................................................................................................22
    +
    +Extensibility of Data Model......................................................................................................... 23
    +
    +Model Relationships.................................................................................................................... 24
    +
    +Data Ingestion Framework.......................................................................................................... 24
    +
    +Data Formats................................................................................................................................ 25
    +
    +Avro............................................................................................................................................................... 25
    +
    +JSON...................................................................................................................................................... 27
    +
    +Parquet................................................................................................................................................... 27
    +
    +ODM Resultant Capability - A Singular View............................................................................ 28
    +
    +**Example - Advanced Threat Modeling**................................................................................................... 28
    +
    +**Example - Singular Data View for Complete Context**................................................................. 29
    +
    +
    +
    +**Overview**
    +----
    +
    +This document describes a strategy for creating an open data model (ODM) for Apache Spot (incubating) (formerly known as \u201cOpen Network Insight (ONI)\u201d) in support of cyber security analytic use cases. It also describes the use cases for which Apache Spot (incubating) running on the Cloudera platform is uniquely capable of addressing along with the data model.
    +
    +
    +
    +**Apache Spot (incubating) Open Data Model Strategy**
    +------------------------------------
    +
    +The Apache Spot (incubating) Open Data Model (ODM) strategy aims to extend Apache Spot (incubating) capabilities to support a broader set of cyber security use cases than initially supported. The primary use case initially supported by Apache Spot (incubating) includes Network Traffic Analysis for network flows (Netflow, sflow, etc.), DNS and Proxy; primarily the identification of threats through anomalous event detection using both supervised and unsupervised machine learning.
    +
    +In order to support a broader set of use cases, Spot must be extended to collect and analyze other common
    +\u201cevent-oriented\u201d data sources analyzed for cyber threats, including but not limited to the following log types:
    +
    +> \u25cfProxy
    +> 
    +> \u25cfWeb server
    +> 
    +> \u25cfOperating system
    +> 
    +> \u25cfFirewall
    +> 
    +> \u25cfIntrusion Prevention/Detection (IDS/ IPS)
    +> 
    +> \u25cfData Loss Prevention
    +> 
    +> \u25cfActive Directory / Identity Management
    +> 
    +> \u25cfUser/Entity Behavior Analysis
    +> 
    +> \u25cfEndpoint Protection/Asset Management
    +> 
    +> \u25cfNetwork Metadata/Session and PCAP files
    +> 
    +> \u25cfNetwork Access Control
    +> 
    +> \u25cfMail
    +> 
    +> \u25cfVPN
    +> 
    +> \u25cf etc..
    +
    +One of the biggest challenges organizations face today in combating cyber threats is collecting and normalizing data from the myriad of security event data sources (hundreds) in order to build the needed analytics. This often results in the analytics being dependent upon the specific technologies used by an organization to detect threats and prevents the needed flexibility and agility to keep up with these ever-increasing (and complex) threats.&nbsp; Technology lock-in is sometimes a byproduct of today\u2019s status quo, as it\u2019s extremely costly to add new technologies (or replace existing ones) because of the downstream analytic dependencies.
    +
    +To achieve the goal of extending Apache Spot (incubating) to support additional use cases, it is necessary to create an open data model for the most relevant security event and contextual data sources; Security event logs or alerts, Network context, User details and information that comes from the endpoints or any other console that are being use to manage the security / administration of our endpoints. The presence of an open data model, which can be applied \u201con-read\u201d or \u201con-write\u201d, in batch or stream, will allow for the separation of security analytics from the specific data sources on which they are built. This \u201cseparation of duties\u201d will enable organizations to build analytics that are not dependent upon specific technologies and provide the flexibility to change underlying data sources and also provide segmentation of this information, without impacting the analytics. This will also afford security vendors the opportunity to build additional products on top of t
 he Open Data Model to drive new revenue streams and also to design new ways to detect threats and APT.
    +
    +
    +**Apache Spot (incubating) Enabled**
    +----
    +
    +**Use Cases**
    +-------------
    +
    +Spot on the Cloudera platform is uniquely positioned to help address the following cyber security use cases,
    +which are not effectively addressed by legacy technologies:
    +
    + 
    +
    + **- Detection of known & unknown threats leveraging machine learning and advanced analytic modeling**
    +
    +Current technologies are limited in the analytics they can apply to detect threats. These limitations stem from the inability to collect all the data sources needed to effectively identify threats (structured, unstructured, etc.) and inability to process the massive volumes of data needed to do so (billions of events per day). Legacy technologies are typically focus and limited to rules-based and signature detection. They are somewhat \u201ceffective\u201d at detecting known threats but struggle with new threats.
    +
    +Spot addresses these gaps through its ability to collect any data type of any volume. Coupled with the various analytic frameworks that are provided (including machine learning), Spot enables a whole new class of analytics that can scale to today\u2019s demands. The topic model used by Spot to detect anomalous network traffic is one example of where the Spot platform excels.
    +
    + **- Reduction of mean time to incident detection & resolution (MTTR)**
    +
    +One of the challenges organizations face today is detecting threats early enough to minimize adverse impacts. This stems from the limitations previously discussed with regards to limited analytics. It can also be attributed to the fact that most of the investigative queries often take hours or days to return results. Legacy technologies can\u2019t offer or have a central data store for facilitating such investigations due to their inability to store and serve the massive amounts of data involved. This cripples incident investigations and results in MTTRs of many weeks or months, meanwhile the adverse impacts of the breach are magnified, thus making the threat harder to eradicate.
    +
    +Apache Spot (incubating) addresses these gaps by providing the capability for a central data store that houses ALL the data needed to facilitate an investigation, returning investigative query results in seconds and minutes (vs. hours and days). Spot can effectively reduce incident MTTR and reduce adverse impacts of a breach.
    +
    + **- Threat Hunting**
    +
    +It\u2019s become necessary for organizations to \u201chunt\u201d for active threats as traditional passive threat detection approaches are not sufficient. \u201cHunting\u201d involves performing ad-hoc searches and queries over vast amounts of data representing many weeks and months\u2019 worth of events, as well as applying ad-hoc / tune algorithms to detect the needle in the haystack. Traditional systems do not perform well for these types of activities as the query results sometimes take hours and days to be retrieved. These traditional systems also lack the analytic flexibility to construct the necessary algorithms and logic needed.
    +
    +Apache Spot (incubating) addresses these gaps in the same ways it addresses others; by providing a central data store with the needed analytic frameworks that scale to the needed workloads.
    +
    +**Data Model**
    +----------
    +In order to provide a framework for effectively analyzing data for cyber threats, it is necessary to collect and
    +analyze standard security event logs/alerts and contextual data regarding the entities referenced in these logs/alerts. The most common entities include network, user and endpoint, but there are others such as file.
    +
    +In the diagram below, the raw event tells us that user \u201cjsmith\u201d successfully logged in to an Oracle database from the IP address 10:1.1.3. Based on the raw event only, we don\u2019t know if this event is a legitimate threat or not. After injecting user and endpoint context, the enriched event tells us this event is a potential threat that requires further investigation.
    +
    +![Screen Shot 2016-09-22 at 1.11.28 PM.png](CybersecurityOpenDataModel0%204-3_files/image001.jpg)
    +
    +Based on the need to collect and analyze both security events, logs or alerts and contextual data, support for
    +the following types of security information are planned for inclusion in the Spot Open Data Model:
    +
    + - Security event logs/alerts
    +This data type includes event logs from common data sources used to detect threats and includes network flows, operating system logs, IPS/IDS logs, firewall logs, proxy logs, web logs, DLP logs, etc.
    +
    + - Network context data
    +This data type includes information about the network, which can be gleaned from Whois servers, asset databases and other similar data sources.
    +
    + - User context data
    +This data type includes information from user and identity management systems including Active Directory, Centrify, and other identity and access management systems.
    +
    + - Endpoint context data
    +This data includes information about endpoint systems (servers, workstations, routers, switches, etc.) and can be sourced from asset management systems, vulnerability scanners, and endpoint  management/detection/response systems such as Webroot, Tanium, Sophos, Endgame, CarbonBlack, Intel Security ePO and others.
    +
    + - File context data** (ROADMAP ITEM)**
    +This data includes contextual information about files and can be sourced from systems such as FireEye, Application Control and others.
    +
    + - Threat intelligence context data **(ROADMAP ITEM)**
    +This data includes contextual information about URLs, domains, websites, files and others.
    +
    +**Naming Convention**
    +-----------------
    +
    +A naming convention is needed for the Open Data Model to represent common attributes across vendor products and technologies. The naming convention is described below.
    +
    +**Prefixes**
    +--------
    +
    +|  Prefix | Description  |  
    +|---|---|
    +|  src | Corresponds to the \u201csource\u201d fields within a given event (i.e. source address)|  
    +|  dst | Corresponds to the \u201cdestination\u201d fields within a given event (i.e. destination address) |  
    +|  dvc | Corresponds to the \u201cdevice\u201d applicable fields within a given event (i.e. device address) and represent where the event originated  |  
    +| fwd  | Forwarded from device   |  
    +| request | Corresponds to requested values (vs. those returned, i.e. \u201crequested URI\u201d) |  
    +| response  | Corresponds to response value (vs. those requested) |  
    +| file  |  Corresponds to the \u201cfile\u201d fields within a given event (i.e. file type) |  
    +| user  | Corresponds to user attributes (i.e. name, id, etc.)  |  
    +| xlate  | Corresponds to translated values within a given event (i.e. src_xlate_ip for \u201ctranslated source ip address\u201d |  
    +| in  | Ingress|  
    +| out | Egress |  
    +| new | New value |  
    +| orig | Original value |  
    +| app | Corresponds to values associated with application events |  
    +
    +
    +**Security Event Log/Alert Data Model**
    +-----------------------------------
    +
    +The data model for security event logs/alerts is detailed in the below. The attributes are categorized as follows:
    +
    + - Common -attributes that are common across many device types
    + - Device -attributes that are applicable to the device that generated the event
    + - File -attributes that are applicable to file objects referenced in the event
    + - Endpoint -attributes that are applicable to the endpoints referenced in the event
    + - User- attributes that are applicable to the user referenced in the event
    + - Proxy - attributes that are applicable to proxy events
    + - Protocol
    +
    +> DNS - attributes that are specific to DNS events
    +> HTTP - attributes that are specific to HTTP events
    +> SMTP, SSH, TLS, DHCP, IRC, SNMP and FTP
    +
    +Note: The model will evolve to include reserved attributes for additional device types that are not currently represented. The model can currently be extended to support ANY attribute for ANY device type by following the guidance outlined in the section titled **\u201cExtensibility of Data Model\u201d.**
    +
    +Note: Attributes denoted in BLUE represent those that are listed in the model multiple times for the purpose of
    --- End diff --
    
    I don't know if markdown can do color. If not, we should change BLUE to bold. And, change all the blue items in the original doc to be bolded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---