You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/10/10 18:56:11 UTC
[GitHub] [metron] mmiklavc edited a comment on issue #1523: METRON-2232 Upgrade to Hadoop 3.1.1

mmiklavc edited a comment on issue #1523: METRON-2232 Upgrade to Hadoop 3.1.1
URL: https://github.com/apache/metron/pull/1523#issuecomment-540208150
 
 
   ## Testing
   
   Adapted from a few places
   * https://gist.github.com/nickwallen/ed67fdc8b399f6db5fa4901b07fc3fff
   * https://cwiki.apache.org/confluence/display/METRON/2016/04/25/Metron+Tutorial+-+Fundamentals+Part+1%3A+Creating+a+New+Telemetry
   
   ### Preliminaries
   
   Test using the centos7 development environment.  
   
   * Start up the centos7 dev environment.
       ```
       cd metron-deployment/development/centos7
       vagrant destroy -f
       vagrant up
       # ssh into the box as root@node1, pwd=vagrant
       ```
   
   * Run as root is fine
   * Set env vars
   ```
   source /etc/default/metron
   ```
   * Root user needs a home dir in HDFS. You can do that as follows:
   ```
   sudo -u hdfs hdfs dfs -mkdir /user/root
   sudo -u hdfs hdfs dfs -chown root:root /user/root
   ```
   * Download the Alexa top 1m data set
   ```
   cd ~/
   wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
   unzip top-1m.csv.zip
   ```
   
   * Stage import file
   ```
   head -n 10000 top-1m.csv > top-10k.csv
   hdfs dfs -put top-10k.csv /tmp
   ```
   
   * Truncate hbase
   ```
   echo "truncate 'enrichment'" | hbase shell
   ```
   
   ### Basic Indexing and Enrichment
   
   Ensure that we can continue to parse, enrich, and index telemetry.  Verify data is flowing through the system, from parsing to indexing
   
   1. Open Ambari and navigate to the Metron service http://node1:8080/#/main/services/METRON/summary
   
   1. Open the Alerts UI.  Verify alerts show up in the main UI - click the search icon (you may need to wait a moment for them to appear)
   
   1. Go to the Alerts UI and ensure that an ever increasing number of telemetry from Bro, Snort, and YAF are visible by watching the total alert count increase over time.
   
   1. Ensure that geoip enrichment is occurring.  The telemetry should contain fields like `enrichments:geo:ip_src_addr:location_point`.
   
   1. Head back to Ambari and select the Kibana service http://node1:8080/#/main/services/KIBANA/summary
   
   1. Open the Kibana dashboard via the "Metron UI" option in the quick links
   
   1. Verify the dashboard is populating
   
   ### Batch Indexing
   
   1. Use the Alerts UI to retrieve a rough count of the number of Bro messages that have been indexed.
   
   1. Retrieve the number of Bro messages that have been indexed in HDFS.
       ```
       [root@node1 0.7.2]# hdfs dfs -cat /apps/metron/indexing/indexed/bro/* | wc -l
       2785
       ```
   
   1. The number of messages indexed in HDFS should be close to the number indexed to the search indices.
   
   ###  Streaming Enrichments
   
   Adapted from the [Metron Tutorial Series](https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment).
   
     1. Launch the Stellar REPL.
         ```
         cd $METRON_HOME
         $METRON_HOME/bin/stellar -z $ZOOKEEPER
         ```
   
     1. Define the streaming enrichment and save it as a new source of telemetry.
   
         ```
         [Stellar]>>> conf := SHELL_EDIT(conf)
         {
           "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
           "writerClassName": "org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter",
           "sensorTopic": "user",
           "parserConfig": {
             "shew.table": "enrichment",
             "shew.cf": "t",
             "shew.keyColumns": "ip",
             "shew.enrichmentType": "user",
             "columns": {
               "user": 0,
               "ip": 1
             }
           }
         }
         [Stellar]>>>
         [Stellar]>>> CONFIG_PUT("PARSER", conf, "user")
         ```
   
     1. Go to the Management UI and start the new parser called 'user'.
   
     1. Create some test telemetry.
         ```
         [Stellar]>>> msgs := ["user1,192.168.1.1", "user2,192.168.1.2", "user3,192.168.1.3"]
         [user1,192.168.1.1, user2,192.168.1.2, user3,192.168.1.3]
         [Stellar]>>> KAFKA_PUT("user", msgs)
         3
         [Stellar]>>> KAFKA_PUT("user", msgs)
         3
         [Stellar]>>> KAFKA_PUT("user", msgs)
         3
         ```
   
     1. Ensure that the enrichments are persisted in HBase.
         ```
         [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.1', 'enrichment', 't')
         {original_string=user1,192.168.1.1, guid=a6caf3c1-2506-4eb7-b33e-7c05b77cd72c, user=user1, timestamp=1551813589399, source.type=user}
   
         [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.2', 'enrichment', 't')
         {original_string=user2,192.168.1.2, guid=49e4b8fa-c797-44f0-b041-cfb47983d54a, user=user2, timestamp=1551813589399, source.type=user}
   
         [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.3', 'enrichment', 't')
         {original_string=user3,192.168.1.3, guid=324149fd-6c4c-42a3-b579-e218c032ea7f, user=user3, timestamp=1551813589402, source.type=user}
         ```
   
   ### Enrichment Coprocessor
   
     1. Confirm that the 'user' enrichment added in the previous section was 'found' by the coprocessor.
           * Go to Swagger. 
           * Click the `sensor-enrichment-config-controller` option.
           * Click the `GET /api/v1/sensor/enrichment/config/list/available/enrichments` option.
   
     1. Click the "Try it out!" button. You should see an array returned with the value of each enrichment type that you have loaded.
       ```
       [
         "user"
       ]
       ```
   
   ### Enrichment Stellar Functions in Storm
   
     Adapted from (https://cwiki.apache.org/confluence/display/METRON/2016/04/28/Metron+Tutorial+-+Fundamentals+Part+2%3A+Creating+a+New+Enrichment) to load
     the user data.
   
     1. Create a simple file called `user.csv`.
       ```
       jdoe,192.168.138.2,
       moredoe,192.168.138.158
       ```
       
     1. Create a file called `user-extractor.json`.
         ```
         {
           "config": {
             "columns": {
               "user": 0,
               "ip": 1
             },
             "indicator_column": "ip",
             "separator": ",",
             "type": "user"
           },
           "extractor": "CSV"
         }
         ```
   
     1. Import the data.
         ```
         source /etc/default/metron
         $METRON_HOME/bin/flatfile_loader.sh -i ./user.csv -t enrichment -c t -e ./user-extractor.json
         ```
   
     1. Validate that the enrichment loaded successfully.
         ```
         [root@node1 0.7.2]# source /etc/default/metron
         [root@node1 0.7.2]# $METRON_HOME/bin/stellar -z $ZOOKEEPER
         
         [Stellar]>>> ip_src_addr := "192.168.138.158"
         192.168.138.158
         
         [Stellar]>>> ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')
         {ip=192.168.138.158, user=moredoe}
   
         [Stellar]>>> ip_dst_addr := "192.168.138.2"
         192.168.138.2
         
         [Stellar]>>> ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')
         {ip=192.168.138.2, user=jdoe}
         ```
   
     1. Use the User data to enrich the telemetry.  Run the following commands in the REPL.
         ```
         [Stellar]>>> bro := SHELL_EDIT()
         {
          "enrichment" : {
            "fieldMap": {
              "stellar" : {
                "config" : {
                  "users" : "ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')",
                  "users2" : "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')"
                }
              }
            }
          },
          "threatIntel": {
            "fieldMap": {},
            "fieldToTypeMap": {}
          }
         }
         [Stellar]>>> CONFIG_PUT("ENRICHMENT", bro, "bro")
         ```
   
     1. Wait for the new configuration to be picked up by the running topology.
   
     1. Review the Bro telemetry indexed into Elasticsearch.  Look for records where the `ip_dst_addr` is `192.168.138.2`. Ensure that some of the messages have the following fields created from the enrichment. (Wait a few minutes longer and you should also eventually start to see records with fields `"users2:user": "moredoe"`).
         * `users:user`
         * `users:ip`
         ```
         {
           "_index": "bro_index_2019.08.13.20",
           "_type": "bro_doc",
           "_id": "AWyMxSJFg1bv3MpSt284",
           ...
           "_source": {          
             "ip_dst_addr": "192.168.138.2",
             "ip_src_addr": "192.168.138.158",
             "timestamp": 1565729823979,
             "source:type": "bro",
             "guid": "6778beb4-569d-478f-b1c9-8faaf475ac2f"
             ...
             "users:user": "jdoe",
             "users:ip": "192.168.138.2",
             ...
           },
           ...
         }
         ```
   
   ### Loaders and Summarizers in MR mode
   
   #### Test the flatfile loader in MR mode
   
   * Create an extractor.json for the CSV data by editing `extractor.json` and pasting in these contents:
   ```
   {
     "config" : {
       "columns" : {
          "domain" : 1,
          "rank" : 0
                   }
       ,"indicator_column" : "domain"
       ,"type" : "alexa"
       ,"separator" : ","
                },
     "extractor" : "CSV"
   }
   ```
   
   * Import from HDFS via MR
   ```
   # import data into hbase 
   $METRON_HOME/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment -c t -e ./extractor.json -m MR
   # count data written and verify it's 10k
   echo "count 'enrichment'" | hbase shell
   ```
   
   #### Test the flatfile summarizer in MR mode
   
   * Create an extractor-count.json file and paste the following:
   ```
   {
     "config" : {
       "columns" : {
          "rank" : 0,
          "domain" : 1
       },
       "value_transform" : {
          "domain" : "DOMAIN_REMOVE_TLD(domain)"
       },
       "value_filter" : "LENGTH(domain) > 0",
       "state_init" : "0L",
       "state_update" : {
          "state" : "state + LENGTH( DOMAIN_TYPOSQUAT( domain ))"
                        },
       "state_merge" : "REDUCE(states, (s, x) -> s + x, 0)",
       "separator" : ","
     },
     "extractor" : "CSV"
   }
   ```
   
   * Create the summary from HDFS via MR
   ```
   $METRON_HOME/bin/flatfile_summarizer.sh -i /tmp/top-10k.csv -e ~/extractor_count.json -p 5 -om CONSOLE -m MR
   ```
   * Verify you see a count in the output similar to the following:
   ```
   Processing /root/top-10k.csv
   19/10/03 21:19:56 WARN resolver.BaseFunctionResolver: Using System classloader
   Processed 9999 - \
   3478276
   ```
   
   ### Legacy HBase Adapter
   
   We are going to perform the same enrichment, but instead using the legacy HBase Adapter.
   
     1. Use the User data to enrich the telemetry.  Run the following commands in the REPL.
         ```
         [Stellar]>>> yaf := SHELL_EDIT()
         {
           "enrichment" : {
             "fieldMap" : {
               "hbaseEnrichment" : [ "ip_dst_addr" ]
             },
             "fieldToTypeMap" : {
                "ip_dst_addr" : [ "user" ]
             },
             "config" : {
               "typeToColumnFamily" : {
                 "user" : "t"
               }
             }
           },
           "threatIntel" : { },
           "configuration" : { }
         }
         [Stellar]>>> CONFIG_PUT("ENRICHMENT", yaf, "yaf")
         ```
       
     1. Wait for the new configuration to be picked up by the running topology.
   
     1. Review the YAF telemetry indexed into Elasticsearch.  Look for records where the `ip_dst_addr` is `192.168.138.2`. Ensure that some of the messages have the following fields created from the enrichment.
         * `enrichments:hbaseEnrichment:ip_dst_addr:user:ip`
         * `enrichments:hbaseEnrichment:ip_dst_addr:user:user`
         ```
         {
           "_index": "yaf_index_2019.08.15.03",
           "_type": "yaf_doc",
           "_id": "AWyTZAwEIFY9jxc2THLF",
           "_version": 1,
           "_score": null,
           "_source": {
             "source:type": "yaf",
             "ip_dst_addr": "192.168.138.2",
             "ip_src_addr": "192.168.138.158",
             "guid": "6c73c09d-f099-4646-b653-762adce121fe",
             ...
             "enrichments:hbaseEnrichment:ip_dst_addr:user:ip": "192.168.138.2",
             "enrichments:hbaseEnrichment:ip_dst_addr:user:user": "jdoe",
           }
         }
         ```   
   ### Profiler
   
   #### Profiler in the REPL
   
   1. Test a profile in the REPL according to [these instructions](https://github.com/apache/metron/tree/master/metron-analytics/metron-profiler-repl#getting-started).
   
       ```
       [Stellar]>>> values := PROFILER_FLUSH(profiler)
       [{period={duration=900000, period=1723089, start=1550780100000, end=1550781000000}, profile=hello-world, groups=[], value=4, entity=192.168.138.158}]
       ```
   
   #### Streaming Profiler
    
   1. Deploy that profile to the Streaming Profiler in Storm.
   
       ```
       [Stellar]>>> CONFIG_PUT("PROFILER", conf)
       ```
   
   1. Wait for the Streaming Profiler in Storm to flush and retrieve the measurement from HBase.  
   
       For the impatient, you can reset the period duration to 1 minute. Alternatively, you can allow the Profiler topology to work for a minute or two and then kill the `profiler` topology which will force it to flush a profile measurement to HBase.
   
       Retrieve the measurement from HBase.  Prior to this PR, it was not possible to query HBase from the REPL.
       ```
       [Stellar]>>> PROFILE_GET("hello-world","192.168.138.158",PROFILE_FIXED(30,"DAYS"))
       [2979]
       ```
   
   #### Batch Profiler
   
   1. Install Spark using Ambari.
   
       1. Stop Storm, YARN, Elasticsearch, Kibana, and Kafka.
   
       1. Install Spark2 using Ambari.
   
       1. Ensure that Spark can talk with HBase.
           ```
           cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/
           ```
   
   1. Use the Batch Profiler to back-fill your profile.  To do this, follow the direction [provided here](https://github.com/apache/metron/tree/master/metron-analytics/metron-profiler-spark#getting-started).
   
   1. Retrieve the entire profile, including the back-filled data.
   
       ```
       [Stellar]>>> PROFILE_GET("hello-world","192.168.138.158",PROFILE_FIXED(30,"DAYS"))
       [1203, 2849, 2900, 1944, 1054, 1241, 1721]
       ```
   
   ### PCAP
   
   Pulled from https://github.com/apache/metron/pull/1157#issuecomment-412972370
   
   Get PCAP data into Metron: 
   1. Install and setup pycapa (this has been updated in master recently) - https://github.com/apache/metron/blob/master/metron-sensors/pycapa/README.md#centos-6
   2. (if using singlenode vagrant) Kill the enrichment, profiler, indexing, and sensor topologies via `for i in bro enrichment random_access_indexing batch_indexing yaf snort;do storm kill $i;done`
   3. Start the pcap topology via $METRON_HOME/bin/start_pcap_topology.sh
   4. Start the pycapa packet capture producer on eth1
   ```
   cd /opt/pycapa/pycapa-venv/bin/usr/bin
   pycapa --producer --kafka-topic pcap --interface eth1 --kafka-broker $BROKERLIST
   ```
   5. Watch the topology in the Storm UI and kill the packet capture utility started earlier when the number of packets ingested is over 3k.
   6. You can leave your virtualenv session now via `deactivate`
   7. Ensure that at at least 3 files exist on HDFS by running `hdfs dfs -ls /apps/metron/pcap/input`
   8. Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility
   ```
   FILE=<file path in hdfs>
   $METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5
   ```
   9. Choose one of the lines in your output and note the protocol. e.g.
   ```
   TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.1,ip_src_port: 60911,ip_dst_addr: 192.168.66.121,ip_dst_port: 8080,protocol: 6
   TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.121,ip_src_port: 8080,ip_dst_addr: 192.168.66.1,ip_dst_port: 60911,protocol: 6
   TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.121,ip_src_port: 8080,ip_dst_addr: 192.168.66.1,ip_dst_port: 60911,protocol: 6
   TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.121,ip_src_port: 8080,ip_dst_addr: 192.168.66.1,ip_dst_port: 60911,protocol: 6
   TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.1,ip_src_port: 60911,ip_dst_addr: 192.168.66.121,ip_dst_port: 8080,protocol: 6
   ```
   
   **Note** when you run the fixed and query filter commands below, the resulting file will be placed in the execution directory where you kicked off the job from.
   
   #### Fixed filter
   
   1. Run a fixed filter query by executing the following command with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch)
   2. `cd ~/; $METRON_HOME/bin/pcap_query.sh fixed -st <start_time> -df "yyyyMMdd" -p <protocol_num> -rpf 500`
   3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
   4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500.
   
   #### Query filter
   
   1. Run a Stellar query filter query by executing a command similar to the following, with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch)
   2. `$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query "protocol == '6'"  -rpf 500`
   3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
   4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500.
   
   ### MaaS
   
   Follow the Example from this README - https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services