You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2018/12/20 19:23:14 UTC
[GitHub] asfgit closed pull request #1309: METRON-1950: Site-book generation broken in master

asfgit closed pull request #1309: METRON-1950: Site-book generation broken in master
URL: https://github.com/apache/metron/pull/1309
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/metron-platform/metron-parsing/README.md b/metron-platform/metron-parsing/README.md
index 76b6168dca..9a46532107 100644
--- a/metron-platform/metron-parsing/README.md
+++ b/metron-platform/metron-parsing/README.md
@@ -21,127 +21,129 @@ limitations under the License.
 
 Parsers are pluggable components which are used to transform raw data
 (textual or raw bytes) into JSON messages suitable for downstream
-enrichment and indexing.  
+enrichment and indexing.
 
 There are two general types types of parsers:
 * A parser written in Java which conforms to the `MessageParser` interface.  This kind of parser is optimized for speed and performance and is built for use with higher velocity topologies.  These parsers are not easily modifiable and in order to make changes to them the entire topology need to be recompiled.  
 * A general purpose parser.  This type of parser is primarily designed for lower-velocity topologies or for quickly standing up a parser for a new telemetry before a permanent Java parser can be written for it.  As of the time of this writing, we have:
-  * Grok parser: `org.apache.metron.parsers.GrokParser` with possible `parserConfig` entries of 
-    * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
-    * `patternLabel` : The pattern label to use from the grok statement
-    * `multiLine` : The raw data passed in should be handled as a long with multiple lines, with each line to be parsed separately. This setting's valid values are 'true' or 'false'.  The default if unset is 'false'. When set the parser will handle multiple lines with successfully processed lines emitted normally, and lines with errors sent to the error topic.
-    * `timestampField` : The field to use for timestamp
-    * `timeFields` : A list of fields to be treated as time
-    * `dateFormat` : The date format to use to parse the time fields
-    * `timezone` : The timezone to use. `UTC` is default.
-    * The Grok parser supports either 1 line to parse per incoming message, or incoming messages with multiple log lines, and will produce a json message per line
-  * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible `parserConfig` entries of
-    * `timestampFormat` : The date format of the timestamp to use.  If unspecified, the parser assumes the timestamp is ms since unix epoch.
-    * `columns` : A map of column names you wish to extract from the CSV to their offsets (e.g. `{ 'name' : 1, 'profession' : 3}`  would be a column map for extracting the 2nd and 4th columns from a CSV)
-    * `separator` : The column separator, `,` by default.
-  * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with possible `parserConfig` entries of
-    * `mapStrategy` : A strategy to indicate how to handle multi-dimensional Maps.  This is one of
-      * `DROP` : Drop fields which contain maps
-      * `UNFOLD` : Unfold inner maps.  So `{ "foo" : { "bar" : 1} }` would turn into `{"foo.bar" : 1}`
-      * `ALLOW` : Allow multidimensional maps
-      * `ERROR` : Throw an error when a multidimensional map is encountered
-    * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, the result of the JSON Path query should be a list of messages. This is useful if you have a JSON document which contains a list or array of messages embedded in it, and you do not have another means of splitting the message.
-    * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present and this flag is present and set to `"true"`, the incoming message will be wrapped in a JSON  entity and array.
-       for example:
-       `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" : [{"name":"value"},{"name2","value2}]}`.
-       This is using the default value for `wrapEntityName` if that property is not set.
-    * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`.  The `jsonpQuery` should reference this name.
-    * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted.  
-  * Regular Expressions Parser
-      * `recordTypeRegex` : A regular expression to uniquely identify a record type.
-      * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages.
-      * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. 
-          For example, following convertions will automatically happen:
-
-          ```
-          ipSrcAddr -> ip_src_addr
-          ipDstAddr -> ip_dst_addr
-          ipSrcPort -> ip_src_port
-          ```
-          Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property.
-          
-      * `fields` : A json list of maps contaning a record type to regular expression mapping.
-      
-      A complete configuration example would look like:
-      
-      ```json
-      "convertCamelCaseToUnderScore": true, 
-      "recordTypeRegex": "kernel|syslog",
-      "messageHeaderRegex": "(<syslogPriority>(<=^&lt;)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
-      "fields": [
-        {
-          "recordType": "kernel",
-          "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
-        },
-        {
-          "recordType": "syslog",
-          "regex": ".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))        (<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
-        }
-      ]
-      ```
-      **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists also e.g.
-      ```json
-          "messageHeaderRegex": [
+    * Grok parser: `org.apache.metron.parsers.GrokParser` with possible `parserConfig` entries of
+        * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
+        * `patternLabel` : The pattern label to use from the grok statement
+        * `multiLine` : The raw data passed in should be handled as a long with multiple lines, with each line to be parsed separately. This setting's valid values are 'true' or 'false'.  The default if unset is 'false'. When set the parser will handle multiple lines with successfully processed lines emitted normally, and lines with errors sent to the error topic.
+        * `timestampField` : The field to use for timestamp
+        * `timeFields` : A list of fields to be treated as time
+        * `dateFormat` : The date format to use to parse the time fields
+        * `timezone` : The timezone to use. `UTC` is default.
+        * The Grok parser supports either 1 line to parse per incoming message, or incoming messages with multiple log lines, and will produce a json message per line
+    * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible `parserConfig` entries of
+        * `timestampFormat` : The date format of the timestamp to use.  If unspecified, the parser assumes the timestamp is ms since unix epoch.
+        * `columns` : A map of column names you wish to extract from the CSV to their offsets (e.g. `{ 'name' : 1, 'profession' : 3}`  would be a column map for extracting the 2nd and 4th columns from a CSV)
+        * `separator` : The column separator, `,` by default.
+    * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with possible `parserConfig` entries of
+        * `mapStrategy` : A strategy to indicate how to handle multi-dimensional Maps.  This is one of
+            * `DROP` : Drop fields which contain maps
+            * `UNFOLD` : Unfold inner maps.  So `{ "foo" : { "bar" : 1} }` would turn into `{"foo.bar" : 1}`
+            * `ALLOW` : Allow multidimensional maps
+            * `ERROR` : Throw an error when a multidimensional map is encountered
+        * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, the result of the JSON Path query should be a list of messages. This is useful if you have a JSON document which contains a list or array of messages embedded in it, and you do not have another means of splitting the message.
+        * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present and this flag is present and set to `"true"`, the incoming message will be wrapped in a JSON  entity and array.
+           for example:
+           `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" : [{"name":"value"},{"name2","value2}]}`.
+           This is using the default value for `wrapEntityName` if that property is not set.
+        * `wrapEntityName` : Sets the name to use when wrapping JSON using `wrapInEntityArray`.  The `jsonpQuery` should reference this name.
+        * A field called `timestamp` is expected to exist and, if it does not, then current time is inserted.
+    * Regular Expressions Parser
+        * `recordTypeRegex` : A regular expression to uniquely identify a record type.
+        * `messageHeaderRegex` : A regular expression used to extract fields from a message part which is common across all the messages.
+        * `convertCamelCaseToUnderScore` : If this property is set to true, this parser will automatically convert all the camel case property names to underscore seperated. For example, following conversions will automatically happen:
+
+            ```
+            ipSrcAddr -> ip_src_addr
+            ipDstAddr -> ip_dst_addr
+            ipSrcPort -> ip_src_port
+            ```
+
+            Note this property may be necessary, because java does not support underscores in the named group names. So in case your property naming conventions requires underscores in property names, use this property.
+
+        * `fields` : A json list of maps contaning a record type to regular expression mapping.
+
+        A complete configuration example would look like:
+
+        ```json
+        "convertCamelCaseToUnderScore": true,
+        "recordTypeRegex": "kernel|syslog",
+        "messageHeaderRegex": "(<syslogPriority>(<=^&lt;)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z] {3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
+        "fields": [
+          {
+            "recordType": "kernel",
+            "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
+          },
+          {
+            "recordType": "syslog",
+            "regex": ".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))        (<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
+          }
+        ]
+        ```
+
+        **Note**: messageHeaderRegex and regex (withing fields) could be specified as lists also e.g.
+
+        ```json
+        "messageHeaderRegex": [
           "regular expression 1",
           "regular expression 2"
-          ]
-      ```
-      Where **regular expression 1** are valid regular expressions and may have named
-      groups, which would be extracted into fields. This list will be evaluated in order until a
-      matching regular expression is found.
-      
-      **messageHeaderRegex** is run on all the messages.
-      Yes, all the messages are expected to contain the fields which are being extracted using the **messageHeaderRegex**.
-      **messageHeaderRegex** is a sort of HCF (highest common factor) in all messages.
-      
-      **recordTypeRegex** can be a more advanced regular expression containing named goups. For example
-  
-      "recordTypeRegex": "(&lt;process&gt;(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
-      
-      Here all the named goups (process in above example) will be extracted as fields.
-
-      Though having named group in recordType is completely optional, still one could want extract named groups in recordType for following reasons:
-
-      1. Since **recordType** regular expression is already getting matched and we are paying the price for a regular expression match already,
-      we can extract certain fields as a by product of this match.
-      2. Most likely the **recordType** field is common across all the messages. Hence having it extracted in the recordType (or messageHeaderRegex) would
-      reduce the overall complexity of regular expressions in the regex field.
-      
-      **regex** within a field could be a list of regular expressions also. In this case all regular expressions in the list will be attempted to match until a match is found. Once a full match is found remaining regular expressions are ignored.
-  
-      ```json
-          "regex":  [ "record type specific regular expression 1",
-                      "record type specific regular expression 2"]
-
-      ```
-
-      **timesamp**
-
-      Since this parser is a general purpose parser, it will populate the timestamp field with current UTC timestamp. Actual timestamp value can be overridden later using stellar.
-      For example in case of syslog timestamps, one could use following stellar construct to override the timestamp value.
-      Let us say you parsed actual timestamp from the raw log:
-
-      <38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod from 55.55.55.55 port 66666 ssh2
-
-      syslogTimestamp="Jun 20 15:01:17"
-
-      Then something like below could be used to override the timestamp.
-
-      ```
-      "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
-      "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )"
-      ```
-
-      OR, if you want to factor in the timezone
-
-      ```
-      "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, timezone_name )"
-      ```
+        ]
+        ```
+
+        Where **regular expression 1** are valid regular expressions and may have named
+        groups, which would be extracted into fields. This list will be evaluated in order until a
+        matching regular expression is found.
+
+        **messageHeaderRegex** is run on all the messages.
+        Yes, all the messages are expected to contain the fields which are being extracted using the **messageHeaderRegex**.
+        **messageHeaderRegex** is a sort of HCF (highest common factor) in all messages.
+
+        **recordTypeRegex** can be a more advanced regular expression containing named goups. For example
+
+        "recordTypeRegex": "(&lt;process&gt;(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
+
+        Here all the named goups (process in above example) will be extracted as fields.
+
+        Though having named group in recordType is completely optional, still one could want extract named groups in recordType for following reasons:
+
+        1. Since **recordType** regular expression is already getting matched and we are paying the price for a regular expression match already,
+        we can extract certain fields as a by product of this match.
+        2. Most likely the **recordType** field is common across all the messages. Hence having it extracted in the recordType (or messageHeaderRegex) would
+        reduce the overall complexity of regular expressions in the regex field.
+
+        **regex** within a field could be a list of regular expressions also. In this case all regular expressions in the list will be attempted to match until a match is found. Once a full match is found remaining regular expressions are ignored.
+
+        ```json
+        "regex":  [ "record type specific regular expression 1",
+                    "record type specific regular expression 2"]
+        ```
+
+        **timesamp**
+
+        Since this parser is a general purpose parser, it will populate the timestamp field with current UTC timestamp. Actual timestamp value can be overridden later using stellar.
+        For example in case of syslog timestamps, one could use following stellar construct to override the timestamp value.
+        Let us say you parsed actual timestamp from the raw log:
+
+        `<38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod from 55.55.55.55 port 66666 ssh2`
+
+        syslogTimestamp="Jun 20 15:01:17"
+
+        Then something like below could be used to override the timestamp.
+
+        ```
+        "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
+        "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )"
+        ```
+
+        OR, if you want to factor in the timezone
+
+        ```
+        "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, timezone_name )"
+        ```
 
 ## Parser Error Routing
 
@@ -204,15 +206,15 @@ So putting it all together a typical Metron message with all 5-tuple fields pres
 
 ```json
 {
-"message": 
-{"ip_src_addr": xxxx, 
-"ip_dst_addr": xxxx, 
-"ip_src_port": xxxx, 
-"ip_dst_port": xxxx, 
-"protocol": xxxx, 
-"original_string": xxx,
-"additional-field 1": xxx,
-}
+  "message": {
+    "ip_src_addr": xxxx,
+    "ip_dst_addr": xxxx,
+    "ip_src_port": xxxx,
+    "ip_dst_port": xxxx,
+    "protocol": xxxx,
+    "original_string": xxx,
+    "additional-field 1": xxx
+  }
 }
 ```
 
@@ -246,16 +248,19 @@ The document is structured in the following way
 
 * `parserClassName` : The fully qualified classname for the parser to be used.
 * `filterClassName` : The filter to use.  This may be a fully qualified classname of a Class that implements the `org.apache.metron.parsers.interfaces.MessageFilter<JSONObject>` interface.  Message Filters are intended to allow the user to ignore a set of messages via custom logic.  The existing implementations are:
-  * `STELLAR` : Allows you to apply a stellar statement which returns a boolean, which will pass every message for which the statement returns `true`.  The Stellar statement that is to be applied is specified by the `filter.query` property in the `parserConfig`.
-Example Stellar Filter which includes messages which contain a the `field1` field:
-```
-   {
-    "filterClassName" : "STELLAR"
-   ,"parserConfig" : {
-    "filter.query" : "exists(field1)"
-    }
-   }
-```
+    * `STELLAR` : Allows you to apply a stellar statement which returns a boolean, which will pass every message for which the statement returns `true`.  The Stellar statement that is to be applied is specified by the `filter.query` property in the `parserConfig`.
+
+        Example Stellar Filter which includes messages which contain a the `field1` field:
+
+        ```
+        {
+          "filterClassName" : "STELLAR",
+          "parserConfig" : {
+            "filter.query" : "exists(field1)"
+          }
+        }
+        ```
+
 * `sensorTopic` : The kafka topic to send the parsed messages to.  If the topic is prefixed and suffixed by `/` 
 then it is assumed to be a regex and will match any topic matching the pattern (e.g. `/bro.*/` would match `bro_cust0`, `bro_cust1` and `bro_cust2`)
 * `readMetadata` : Boolean indicating whether to read metadata or not (The default is raw message strategy dependent).  See below for a discussion about metadata.
@@ -263,26 +268,27 @@ then it is assumed to be a regex and will match any topic matching the pattern (
 * `rawMessageStrategy` : The strategy to use when reading the raw data and metadata.  See below for a discussion about message reading strategies.
 * `rawMessageStrategyConfig` : The raw message strategy configuration map.  See below for a discussion about message reading strategies.
 * `parserConfig` : A JSON Map representing the parser implementation specific configuration. Also include batch sizing and timeout for writer configuration here.
-  * `batchSize` : Integer indicating number of records to batch together before sending to the writer. (default to `15`)
-  * `batchTimeout` : The timeout after which a batch will be flushed even if batchSize has not been met.  Optional.
-    If unspecified, or set to `0`, it defaults to a system-determined duration which is a fraction of the Storm
-    parameter `topology.message.timeout.secs`.  Ignored if batchSize is `1`, since this disables batching.
-  * The kafka writer can be configured within the parser config as well.  (This is all configured a priori, but this is convenient for overriding the settings).  See [here](../../metron-writer/README.md#kafka-writer)
+    * `batchSize` : Integer indicating number of records to batch together before sending to the writer. (default to `15`)
+    * `batchTimeout` : The timeout after which a batch will be flushed even if batchSize has not been met.  Optional.
+      If unspecified, or set to `0`, it defaults to a system-determined duration which is a fraction of the Storm
+      parameter `topology.message.timeout.secs`.  Ignored if batchSize is `1`, since this disables batching.
+    * The kafka writer can be configured within the parser config as well.  (This is all configured a priori, but this is convenient for overriding the settings).  See [here](../../metron-writer/README.md#kafka-writer)
 * `fieldTransformations` : An array of complex objects representing the transformations to be done on the message generated from the parser before writing out to the kafka topic.
 * `securityProtocol` : The security protocol to use for reading from kafka (this is a string).  This can be overridden on the command line and also specified in the spout config via the `security.protocol` key.  If both are specified, then they are merged and the CLI will take precedence. If multiple sensors are used, any non "PLAINTEXT" value will be used.
 * `cacheConfig` : Cache config for stellar field transformations.   This configures a least frequently used cache.  This is a map with the following keys.  If not explicitly configured (the default), then no cache will be used.
-  * `stellar.cache.maxSize` - The maximum number of elements in the cache. Default is to not use a cache.
-  * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is kept in the cache (in minutes). Default is to not use a cache.
+    * `stellar.cache.maxSize` - The maximum number of elements in the cache. Default is to not use a cache.
+    * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is kept in the cache (in minutes). Default is to not use a cache.
 
-  Example of a cache config to contain at max `20000` stellar expressions for at most `20` minutes.:
-```
-{
-  "cacheConfig" : {
-    "stellar.cache.maxSize" : 20000,
-    "stellar.cache.maxTimeRetain" : 20
-  }
-}
-```
+        Example of a cache config to contain at max `20000` stellar expressions for at most `20` minutes.:
+
+        ```
+        {
+          "cacheConfig" : {
+            "stellar.cache.maxSize" : 20000,
+            "stellar.cache.maxTimeRetain" : 20
+          }
+        }
+        ```
 
 The `fieldTransformations` is a complex object which defines a
 transformation which can be done to a message.  This transformation can 
@@ -298,36 +304,34 @@ For platform specific configs, see the README of the appropriate project. This w
 Metadata is a useful thing to send to Metron and use during enrichment or threat intelligence.  
 Consider the following scenarios:
 * You have multiple telemetry sources of the same type that you want to 
-  * ensure downstream analysts can differentiate
-  * ensure profiles consider independently as they have different seasonality or some other fundamental characteristic
+    * ensure downstream analysts can differentiate
+    * ensure profiles consider independently as they have different seasonality or some other fundamental characteristic
 
 As such, there are two types of metadata that we seek to support in Metron:
 * Environmental metadata : Metadata about the system at large
-   * Consider the possibility that you have multiple kafka topics being processed by one parser and you want to tag the messages with the kafka topic
-   * At the moment, only the kafka topic is kept as the field name.
+    * Consider the possibility that you have multiple kafka topics being processed by one parser and you want to tag the messages with the kafka topic
+    * At the moment, only the kafka topic is kept as the field name.
 * Custom metadata: Custom metadata from an individual telemetry source that one might want to use within Metron. 
 
 Metadata is controlled by the following parser configs:
-* `rawMessageStrategy` : This is a strategy which indicates how to read
-  data and metadata.  The strategies supported are:
-  * `DEFAULT` : Data is read directly from the kafka record value and metadata, if any, is read from the kafka record key.  This strategy defaults to not reading metadata and not merging metadata.  This is the default strategy.
-  * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob. One of
-    these fields must contain the raw data to pass to the parser.  All other fields should be considered metadata.  The field containing the raw data is specified in the `rawMessageStrategyConfig`.  Data held in the kafka key as well as the non-data fields in the JSON blob passed into the kafka value are considered metadata. Note that the exception to this is that any `original_string` field is inherited from the envelope data so that the original string contains the envelope data.  If you do not prefer this behavior, remove this field from the envelope data.
+* `rawMessageStrategy` : This is a strategy which indicates how to read data and metadata.  The strategies supported are:
+    * `DEFAULT` : Data is read directly from the kafka record value and metadata, if any, is read from the kafka record key.  This strategy defaults to not reading metadata and not merging metadata.  This is the default strategy.
+    * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob. One of
+      these fields must contain the raw data to pass to the parser.  All other fields should be considered metadata.  The field containing the raw data is specified in the `rawMessageStrategyConfig`.  Data held in the kafka key as well as the non-data fields in the JSON blob passed into the kafka value are considered metadata. Note that the exception to this is that any `original_string` field is inherited from the envelope data so that the original string contains the envelope data.  If you do not prefer this behavior, remove this field from the envelope data.
 * `rawMessageStrategyConfig` : The configuration (a map) for the `rawMessageStrategy`.  Available configurations are strategy dependent:
-  * `DEFAULT` 
-    * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`).
-  * `ENVELOPE` 
-    * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`) 
-    * `messageField` defines the field from the envelope to use as the data.  All other fields are considered metadata.
+    * `DEFAULT`
+        * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`).
+    * `ENVELOPE`
+        * `metadataPrefix` defines the key prefix for metadata (default is `metron.metadata`)
+        * `messageField` defines the field from the envelope to use as the data.  All other fields are considered metadata.
 * `readMetadata` : This is a boolean indicating whether metadata will be read and made available to Field 
 transformations (i.e. Stellar field transformations).  The default is
 dependent upon the `rawMessageStrategy`:
-  * `DEFAULT` : default to `false`.
-  * `ENVELOPE` : default to `true`.
+    * `DEFAULT` : default to `false`.
+    * `ENVELOPE` : default to `true`.
 * `mergeMetadata` : This is a boolean indicating whether metadata fields will be merged with the message automatically.  That is to say, if this property is set to `true` then every metadata field will become part of the messages and, consequently, also available for use in field transformations.  The default is dependent upon the `rawMessageStrategy`:
-  * `DEFAULT` : default to `false`.
-  * `ENVELOPE` : default to `true`.
-
+    * `DEFAULT` : default to `false`.
+    * `ENVELOPE` : default to `true`.
 
 #### Field Naming
 
@@ -359,119 +363,125 @@ The format of a `fieldTransformation` is as follows:
 The currently implemented fieldTransformations are:
 * `REMOVE` : This transformation removes the specified input fields.  If you want a conditional removal, you can pass a Metron Query Language statement to define the conditions under which you want to remove the fields. 
 
-Consider the following simple configuration which will remove `field1`
-unconditionally:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "input" : "field1"
-          , "transformation" : "REMOVE"
-          }
-                      ]
-}
-```
+    Consider the following simple configuration which will remove `field1`
+    unconditionally:
 
-Consider the following simple sensor parser configuration which will remove `field1`
-whenever `field2` exists and whose corresponding equal to 'foo':
-```
-{
-...
-  "fieldTransformations" : [
-          {
-            "input" : "field1"
-          , "transformation" : "REMOVE"
-          , "config" : {
-              "condition" : "exists(field2) and field2 == 'foo'"
-                       }
-          }
-                      ]
-}
-```
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "input" : "field1"
+              , "transformation" : "REMOVE"
+              }
+                          ]
+    }
+    ```
+
+    Consider the following simple sensor parser configuration which will remove `field1`
+    whenever `field2` exists and whose corresponding equal to 'foo':
+
+    ```
+    {
+    ...
+      "fieldTransformations" : [
+              {
+                "input" : "field1"
+              , "transformation" : "REMOVE"
+              , "config" : {
+                  "condition" : "exists(field2) and field2 == 'foo'"
+                           }
+              }
+                          ]
+    }
+    ```
 
 * `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
 
-For example: 
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "output" : ["field1", "field2" ] 
-          , "transformation" : "SELECT"
-          }
-                      ]
-}
-```
+    For example:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "output" : ["field1", "field2" ]
+              , "transformation" : "SELECT"
+              }
+                          ]
+    }
+    ```
 
-when applied to a message containing keys field1, field2 and field3, will only output the first two. It is also worth noting that two standard fields - timestamp and original_source - will always be passed along whether they are listed in output or not, since they are considered core required fields.
+    when applied to a message containing keys field1, field2 and field3, will only output the first two. It is also worth noting that two standard fields - timestamp and original_source - will always be passed along whether they are listed in output or not, since they are considered core required fields.
 
 * `IP_PROTOCOL` : This transformation maps IANA protocol numbers to consistent string representations.
 
-Consider the following sensor parser config to map the `protocol` field
-to a textual representation of the protocol:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "input" : "protocol"
-          , "transformation" : "IP_PROTOCOL"
-          }
-                      ]
-}
-```
+    Consider the following sensor parser config to map the `protocol` field
+    to a textual representation of the protocol:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "input" : "protocol"
+              , "transformation" : "IP_PROTOCOL"
+              }
+                          ]
+    }
+    ```
 
-This transformation would transform `{ "protocol" : 6, "source.type" : "bro", ... }` 
-into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
+    This transformation would transform `{ "protocol" : 6, "source.type" : "bro", ... }`
+    into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
 
-* `STELLAR` : This transformation executes a set of transformations
-  expressed as [Stellar Language](../../metron-common) statements.
+* `STELLAR` : This transformation executes a set of transformations expressed as [Stellar Language](../../metron-common) statements.
 
 * `RENAME` : This transformation allows users to rename a set of fields.  Specifically,
 the config is presumed to be the mapping.  The keys to the config are the existing field names
 and the values for the config map are the associated new field name.
 
-The following config will rename the fields `old_field` and `different_old_field` to
-`new_field` and `different_new_field` respectively:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "transformation" : "RENAME",
-          , "config" : {
-            "old_field" : "new_field",
-            "different_old_field" : "different_new_field"
-                       }
-          }
-                      ]
-}
-```
+    The following config will rename the fields `old_field` and `different_old_field` to
+    `new_field` and `different_new_field` respectively:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "transformation" : "RENAME",
+              , "config" : {
+                "old_field" : "new_field",
+                "different_old_field" : "different_new_field"
+                           }
+              }
+                          ]
+    }
+    ```
+
 * `REGEX_SELECT` : This transformation lets users set an output field to one of a set of possibilities based on matching regexes. This transformation is useful when the number or conditions are large enough to make a stellar language match statement unwieldy.
  
-The following config will set the field `logical_source_type` to one of the
-following, dependent upon the value of the `pix_type` field:
-* `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
-* `cisco-5-304` if `pix_type` starts with `5-304`
-```
-{
-...
-  "fieldTransformations" : [
+    The following config will set the field `logical_source_type` to one of the
+    following, dependent upon the value of the `pix_type` field:
+    * `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
+    * `cisco-5-304` if `pix_type` starts with `5-304`
+
+    ```
     {
-     "transformation" : "REGEX_ROUTING"
-    ,"input" :  "pix_type"
-    ,"output" :  "logical_source_type"
-    ,"config" : {
-      "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
-      "cisco-5-304" : "^5-304.*"
-                }
+    ...
+      "fieldTransformations" : [
+        {
+         "transformation" : "REGEX_ROUTING"
+        ,"input" :  "pix_type"
+        ,"output" :  "logical_source_type"
+        ,"config" : {
+          "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
+          "cisco-5-304" : "^5-304.*"
+                    }
+        }
+                               ]
+    ...
     }
-                           ]
-...  
-}
-```
+    ```
 
 
 ### Assignment to `null`
diff --git a/metron-platform/metron-parsing/metron-parsers-common/parser_arch.png b/metron-platform/metron-parsing/parser_arch.png
similarity index 100%
rename from metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
rename to metron-platform/metron-parsing/parser_arch.png
diff --git a/site-book/bin/generate-md.sh b/site-book/bin/generate-md.sh
index 60549f8eaa..7ebb5f6768 100755
--- a/site-book/bin/generate-md.sh
+++ b/site-book/bin/generate-md.sh
@@ -64,7 +64,7 @@ RESOURCE_LIST=(
     metron-deployment/readme-images/enable-kerberos-started.png
     metron-deployment/readme-images/enable-kerberos.png
     metron-platform/metron-job/metron-job_state_statechart_diagram.svg
-    metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
+    metron-platform/metron-parsing/parser_arch.png
     metron-platform/metron-indexing/indexing_arch.png
     metron-platform/metron-enrichment/enrichment_arch.png
     metron-analytics/metron-maas-service/maas_arch.png
@@ -96,8 +96,8 @@ HREF_REWRITE_LIST=(
     metron-platform/metron-enrichment/README.md 's#(enrichment_arch.png)#(../../images/enrichment_arch.png)#g'
     metron-platform/metron-indexing/README.md 's#(indexing_arch.png)#(../../images/indexing_arch.png)#g'
     metron-platform/metron-job/README.md 's#(metron-job_state_statechart_diagram.svg)#(../../images/metron-job_state_statechart_diagram.svg)#g'
-    metron-platform/metron-parsing/metron-parsers-common/README.md 's#(parser_arch.png)#(../../images/parser_arch.png)#g'
-    metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md 's#(../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../images/message_routing_high_level.svg)#g'
+    metron-platform/metron-parsing/README.md 's#(parser_arch.png)#(../../images/parser_arch.png)#g'
+    metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md 's#(../../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../../images/message_routing_high_level.svg)#g'
     metron-analytics/metron-maas-service/README.md 's#(maas_arch.png)#(../../images/maas_arch.png)#g'
     metron-contrib/metron-performance/README.md 's#(performance_measurement.png)#(../../images/performance_measurement.png)#g'
     use-cases/forensic_clustering/README.md 's#(find_alerts.png)#(../../images/find_alerts.png)#g'


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services