You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/05/29 18:36:25 UTC

[GitHub] [metron] mmiklavc edited a comment on issue #1409: METRON-2112 Normalize parser original_string handling

mmiklavc edited a comment on issue #1409: METRON-2112 Normalize parser original_string handling
URL: https://github.com/apache/metron/pull/1409#issuecomment-497049477
 
 
   ### Test Plan
   
   **Test the default original_string functionality**
   
   * Create kafka topic
   ```
   /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper $ZOOKEEPER --topic jsonMapQuery --from-beginning
   ```
   
   * Pull configs down from ZK
   ```
   $METRON_HOME/bin/zk_load_configs.sh -m PULL -o ${METRON_HOME}/config/zookeeper -z $ZOOKEEPER -f
   ```
   
   * Create indexing config. We won't need one for the parser itself because a default is provided already.
   ```
   # /usr/metron/0.7.2/config/zookeeper/indexing/jsonMapQuery.json
   {
     "hdfs" : {
       "index": "json_map_query",
       "batchSize": 1,
       "enabled" : true
     },
     "elasticsearch" : {
       "index": "json_map_query",
       "batchSize": 1,
       "enabled" : true
     },
     "solr" : {
       "index": "json_map_query",
       "batchSize": 1,
       "enabled" : true
     }
   }
   ```
   
   * Push config back up to ZK
   ```
   $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper/ -z $ZOOKEEPER
   ```
   
   * Start the topology 
   ```
   $METRON_HOME/bin/start_parser_topology.sh -z $ZOOKEEPER -s jsonMapQuery
   ```
   
   * Add some data to a file named json-data.json
   ```
   {"foo":[{ "string" : "bar", "number" : 1, "ignored" : [ "blah" ] },{ "number" : 2 },{ "number" : 3 },{ "number" : 4 },{ "number" : 5 },{ "number" : 6 },{ "number" : 7 },{ "number" : 8 },{ "number" : 9 },{ "number" : 10 }]}
   ```
   
   * Send the data to Kafka
   ```
   cat json-data.json | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic jsonMapQuery
   ```
   
   * Expect to see 10 new messages created in json_map_query_index
   ```
   curl -XGET "http://node1:9200/json_map_query*/_search?pretty=true"
   {
     "took" : 18,
     "timed_out" : false,
     "_shards" : {
       "total" : 5,
       "successful" : 5,
       "skipped" : 0,
       "failed" : 0
     },
     "hits" : {
       "total" : 10,
       "max_score" : 1.0,
       "hits" : [
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SuUcGT3jmMsgLwb",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376560",
             "parallelenricher:enrich:end:ts" : "1559104376560",
             "number" : 6,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376560",
             "parallelenricher:splitter:end:ts" : "1559104376560",
             "guid" : "975a0761-4590-4290-a9d0-a98493eb1bb0",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SvZcGT3jmMsgLwf",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376561",
             "parallelenricher:enrich:end:ts" : "1559104376561",
             "number" : 10,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376561",
             "parallelenricher:splitter:end:ts" : "1559104376561",
             "guid" : "3de3a1ab-11fc-42f0-8083-1349e512e113",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3StRcGT3jmMsgLwZ",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376559",
             "parallelenricher:enrich:end:ts" : "1559104376559",
             "number" : 4,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376559",
             "parallelenricher:splitter:end:ts" : "1559104376559",
             "guid" : "f170f724-8da8-48f4-8f33-62aaa5a5842d",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SvCcGT3jmMsgLwd",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376561",
             "parallelenricher:enrich:end:ts" : "1559104376561",
             "number" : 8,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376561",
             "parallelenricher:splitter:end:ts" : "1559104376561",
             "guid" : "060f5142-73b6-4fc4-b3fa-ad17f5c4f7ce",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SvLcGT3jmMsgLwe",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376561",
             "parallelenricher:enrich:end:ts" : "1559104376561",
             "number" : 9,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376561",
             "parallelenricher:splitter:end:ts" : "1559104376561",
             "guid" : "d805927d-8fef-4a70-8cf2-f4441eb1f70f",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SurcGT3jmMsgLwc",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376560",
             "parallelenricher:enrich:end:ts" : "1559104376560",
             "number" : 7,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376560",
             "parallelenricher:splitter:end:ts" : "1559104376560",
             "guid" : "658f48b8-19a7-4506-aabd-133c54ddde89",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SsKcGT3jmMsgLwW",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376559",
             "parallelenricher:enrich:end:ts" : "1559104376559",
             "number" : 1,
             "ignored" : [
               "blah"
             ],
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "string" : "bar",
             "parallelenricher:enrich:begin:ts" : "1559104376559",
             "parallelenricher:splitter:end:ts" : "1559104376559",
             "guid" : "b569df68-e621-46ed-8390-c2adc257c055",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3StHcGT3jmMsgLwY",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376559",
             "parallelenricher:enrich:end:ts" : "1559104376559",
             "number" : 3,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376559",
             "parallelenricher:splitter:end:ts" : "1559104376559",
             "guid" : "f37b4283-a445-47c3-a965-c17937e39755",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3SswcGT3jmMsgLwX",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376559",
             "parallelenricher:enrich:end:ts" : "1559104376559",
             "number" : 2,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376559",
             "parallelenricher:splitter:end:ts" : "1559104376559",
             "guid" : "d7c838e6-ee9f-4067-aa59-10f9ff8412d3",
             "timestamp" : 1559104359358
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.04",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsB3St4cGT3jmMsgLwa",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559104376560",
             "parallelenricher:enrich:end:ts" : "1559104376560",
             "number" : 5,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"foo\":[{ \"string\" : \"bar\", \"number\" : 1, \"ignored\" : [ \"blah\" ] },{ \"number\" : 2 },{ \"number\" : 3 },{ \"number\" : 4 },{ \"number\" : 5 },{ \"number\" : 6 },{ \"number\" : 7 },{ \"number\" : 8 },{ \"number\" : 9 },{ \"number\" : 10 }]}",
             "parallelenricher:enrich:begin:ts" : "1559104376560",
             "parallelenricher:splitter:end:ts" : "1559104376560",
             "guid" : "8ad4d1fc-8833-4976-9166-846d0f1d91bb",
             "timestamp" : 1559104359358
           }
         }
       ]
     }
   }
   ```
   Verify every message contains the full original_string as source, unchanged.
   
   **Test JsonMapQuery parser override functionality**
   
   * Open the jsonMapQuery parser config and add an override string field set to true.
   ```
   {
     "parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
     "sensorTopic":"jsonMapQuery",
     "parserConfig": {"jsonpQuery":"$.foo", "overrideOriginalString" : true}
   }
   ```
   
   * Push to ZK.
   ```
   $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper/ -z $ZOOKEEPER
   # verify the change went through
   $METRON_HOME/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER -c PARSER -n jsonMapQuery
   ```
   
   * Restart the topology (parserConfigs are not loaded dynamically)
   ```
   storm kill jsonMapQuery
   $METRON_HOME/bin/start_parser_topology.sh -z $ZOOKEEPER -s jsonMapQuery
   ```
   
   * Clear your json sensor index
   ```
   curl -XDELETE "http://localhost:9200/json_map_query*"
   # verify it's empty
   curl -XGET "http://node1:9200/json_map_query*/_stats/docs?pretty=true"
   {
     "_shards" : {
       "total" : 0,
       "successful" : 0,
       "failed" : 0
     },
     "_all" : {
       "primaries" : { },
       "total" : { }
     },
     "indices" : { }
   }
   ```
   
   * Run the data through again
   ```
   cat json-data.json | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic jsonMapQuery
   ```
   
   * Expect to see 10 new messages created in json_map_query_index
   ```
   curl -XGET "http://node1:9200/json_map_query*/_search?pretty=true"
   {
     "took" : 2,
     "timed_out" : false,
     "_shards" : {
       "total" : 5,
       "successful" : 5,
       "skipped" : 0,
       "failed" : 0
     },
     "hits" : {
       "total" : 10,
       "max_score" : 1.0,
       "hits" : [
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt80cGT3jmMsgLwz",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200444",
             "parallelenricher:enrich:end:ts" : "1559152200444",
             "number" : 1,
             "ignored" : [
               "blah"
             ],
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":1,\"ignored\":[\"blah\"],\"string\":\"bar\"}",
             "string" : "bar",
             "parallelenricher:enrich:begin:ts" : "1559152200444",
             "parallelenricher:splitter:end:ts" : "1559152200444",
             "guid" : "820c7c4d-14a5-4fca-a43f-2c2fa207c6e7",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt9UcGT3jmMsgLw0",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 2,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":2}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "270ec17b-a34f-4a8e-b2a1-32b4ad5891d7",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt_ZcGT3jmMsgLw8",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 10,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":10}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "8bc38d0c-0172-464f-8794-f633cac6e60d",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt9lcGT3jmMsgLw1",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 3,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":3}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "a9a89c94-922b-41fe-ad37-3f0cb758dc8e",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt96cGT3jmMsgLw3",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 5,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":5}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "7917038a-67ba-497c-b74e-47c9fe13a34e",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt-qcGT3jmMsgLw5",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 7,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":7}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "0fc44b9c-e218-42f5-816b-0b459088b80a",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt9ucGT3jmMsgLw2",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 4,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":4}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "8aa5943a-884d-4ce5-8816-a227075e2e8f",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt-VcGT3jmMsgLw4",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 6,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":6}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "0eb61762-435a-4dc6-b025-f82f761c75cc",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt-3cGT3jmMsgLw6",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 8,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":8}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "4763dfa1-91fd-4c21-ab48-28d564a5c878",
             "timestamp" : 1559152177657
           }
         },
         {
           "_index" : "json_map_query_index_2019.05.29.17",
           "_type" : "jsonMapQuery_doc",
           "_id" : "AWsEtt_LcGT3jmMsgLw7",
           "_score" : 1.0,
           "_source" : {
             "parallelenricher:splitter:begin:ts" : "1559152200445",
             "parallelenricher:enrich:end:ts" : "1559152200445",
             "number" : 9,
             "source:type" : "jsonMapQuery",
             "original_string" : "{\"number\":9}",
             "parallelenricher:enrich:begin:ts" : "1559152200445",
             "parallelenricher:splitter:end:ts" : "1559152200445",
             "guid" : "74c7c52a-9aa3-43c4-b7aa-d5a8edbf9409",
             "timestamp" : 1559152177657
           }
         }
       ]
     }
   }
   ```
   
   The `original_string` should now be the individual field that was parsed from the root message.
   
   **Check parser chaining and aggregation use case**
   
   https://github.com/apache/metron/tree/master/use-cases/parser_chaining
   
   This feature should still work as-is without any change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services