You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/09/03 20:15:44 UTC

[GitHub] [metron] mmiklavc edited a comment on issue #1483: METRON-2217 Migrate current HBase client from HTableInterface to Table

mmiklavc edited a comment on issue #1483: METRON-2217 Migrate current HBase client from HTableInterface to Table
URL: https://github.com/apache/metron/pull/1483#issuecomment-525102320
 
 
   ## Test Plan
   
   ### Enrichments
   
   This will cover enrichments, threat intel, and the bulk loading utilities that write data to HBase
   
   #### Test basic enrichment
   
   Spin up full dev
   
   Optional - free up resources. We're going to be spinning up some additional topologies. The resources in full dev are limited, so you'll probably want to stop non-critical topologies in order to have enough Storm slots.
   
   ```
   for parser in bro__snort__yaf profiler pcap batch_indexing; do storm kill $parser; don
   ```
   
   Follow the following [updated] blog series steps here to get some data into Metron using Squid along with an enrichment
   
   1. https://cwiki.apache.org/confluence/display/METRON/2016/04/25/Metron+Tutorial+-+Fundamentals+Part+1%3A+Creating+a+New+Telemetry
   2. https://cwiki.apache.org/confluence/display/METRON/2016/04/28/Metron+Tutorial+-+Fundamentals+Part+2%3A+Creating+a+New+Enrichment
   
   #### Test threat intel
   
   1. https://cwiki.apache.org/confluence/display/METRON/2016/05/02/Metron+Tutorial+-+Fundamentals+Part+4%3A+Pluggable+Threat+Intelligence
   
   #### Test multi-threading
   
   For the final step, we'll deviate from the blog a bit so we can test that the thread pool doesn't cause any deadlocking/threading issues on the new HBase connection approach. Taken from https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment. Follow the steps in the blog tutorial for setting up the user streaming enrichment, but instead of modifying/using bro as suggested at the end, follow the below instructions.
   
   Let's load the original whois list from step 1 as a threatintel for added fun. This way we can run multiple enrichments and also have it trigger threat intel from the same messages. Create a file `blocklist2.csv` with the following contents:
   ```
   [root@node1: ~]
   # cat blocklist2.csv
   aliexpress.com,squidblacklist.org
   pravda.ru,squidblacklist.org
   google.com,squidblacklist.org
   brightsideofthesun.com,squidblacklist.org
   microsoftstore.com,squidblacklist.org
   autonews.com,squidblacklist.org
   facebook.com,squidblacklist.org
   ebay.com,squidblacklist.org
   recruit.jp,squidblacklist.org
   lada.ru,squidblacklist.org
   aliexpress.com,squidblacklist.org
   ```
   
   Load the threat intel into HBase
   `${METRON_HOME}/bin/flatfile_loader.sh -i blocklist2.csv -t threatintel -c t -e threatintel_extractor_config.json`
   
   Clear the squid logs
   ```
   rm /var/log/squid/access.log
   touch /var/log/squid/access.log
   chown squid:squid /var/log/squid/access.log
   service squid restart
   ```
   
   Re-run new squid client commands similar to step 1. Rather than a fraction of the records matching on domain for the whois enrichment, we'll have them all match for this test.
   ```
   squidclient "https://www.google.com/maps/place/Waterford,+WI/@42.7639877,-88.2867248,12z/data=!4m5!3m4!1s0x88059e67de9a3861:0x2d24f51aad34c80b!8m2!3d42.7630722!4d-88.2142563"
   squidclient "http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878"
   squidclient "https://community.cisco.com/t5/technology-and-support/ct-p/technology-support"
   squidclient "https://www.capitalone.com/support-center"
   squidclient "https://www.cnn.com/about"
   squidclient "https://contact.nba.com/"
   squidclient "https://www.espn.com/nfl/team/_/name/cle/cleveland-browns"
   ```
   
   Update your squid.json enrichment to include Stellar enrichments. We're going to duplicate the `whois` enrichment multiple times for the sake of simplicity.
   
   ```
   # cat $METRON_HOME/config/zookeeper/enrichments/squid.json
   {
     "enrichment" : {
       "fieldMap" : {
         "hbaseEnrichment" : [ "domain_without_subdomains" ],
         "stellar" : {
          "config" : {
            "e1" : {
              "user" : "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')"
            },
            "e2" : {
              "dws1" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
            },
            "e3" : {
              "dws2" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
            },
            "e4" : {
              "dws3" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
            },
            "e5" : {
              "dws4" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
            },
            "e6" : {
              "dws5" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
            }
          }
        }
       },
       "fieldToTypeMap" : {
         "domain_without_subdomains" : [ "whois" ]
       },
       "config" : { }
     },
     "threatIntel" : {
       "fieldMap" : {
         "hbaseThreatIntel" : [ "domain_without_subdomains" ]
       },
       "fieldToTypeMap" : {
         "domain_without_subdomains" : [ "squidBlacklist" ]
       },
       "config" : { },
       "triageConfig" : {
         "riskLevelRules" : [ ],
         "aggregator" : "MAX",
         "aggregationConfig" : { }
       }
     },
     "configuration" : { }
   }
   ```
   
   Load the changed enrichment
   ```
   ${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i ${METRON_HOME}/config/zookeeper
   # verify it loaded
   ${METRON_HOME}/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER -c ENRICHMENT -n squid
   ```
   
   Wipe your squid indexes in ES
   ```
   curl -XDELETE "http://node1:9200/squid*"
   ```
   
   Stop the enrichment topology
   
   In Ambari, navigate to Metron > Configs > Enrichment. Make the following config adjustments:
   1. Set Unified Enrichment Parallelism to 3
   2. Set Unified Threat Intel Parallelism to 3
   3. Set Unified Enrichment Cache Size to 0 (force cache misses so we hit HBase)
   4. Set Unified Threat Intel Cache Size to 0 (force cache misses so we hit HBase)
   5. Set Unified Enrichment Thread Pool Size to 5. 
   
   Restart the enrichment topology. You should see a log message in the storm worker logs similar to the following:
   ```
   2019-08-26 17:52:40.162 o.a.m.e.b.UnifiedEnrichmentBolt Thread-8-threatIntelBolt-executor[7 7] [INFO] Creating new threadpool of size 5
   ```
   
   Import the squid access data to Kafka. Run it multiple times by running the following:
   ```
   for in in {1..30}; do cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid; done
   ```
   
   After a bit of time, you should see new records in the squid index that have the new enrichment and threat intel fields (note the fields dws #1-4). You should get 210 records in your squid index assuming you setup your squid access log with 7 records during the earlier squidclient setup.
   ```
   {
   "_index": "squid_index_2019.08.24.00",
   "_type": "squid_doc",
   "_id": "AWzBEZ7MZrHsl7xo6X-6",
   "_version": 1,
   "_score": 1,
   "_source": {
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:owner": "ESPN, Inc.",
   "full_hostname": "www.espn.com",
   "dws1:home_country": "US",
   "dws1:domain": "espn.com",
   "dws2:domain": "espn.com",
   "dws3:home_country": "US",
   "dws1:domain_created_timestamp": "781268400000",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:home_country": "US",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:domain_created_timestamp": "781268400000",
   "dws5:home_country": "US",
   "parallelenricher:enrich:end:ts": "1566607252930",
   "adapter:threatinteladapter:end:ts": "1566607252930",
   "original_string": "1566604971.782 732 127.0.0.1 TCP_MISS/200 331562 GET https://www.espn.com/nfl/team/_/name/cle/cleveland-browns - DIRECT/54.152.255.68 text/html",
   "dws3:registrar": "ESPN, Inc.",
   "dws4:owner": "ESPN, Inc.",
   "action": "TCP_MISS",
   "dws4:domain": "espn.com",
   "dws5:domain": "espn.com",
   "dws3:domain": "espn.com",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:registrar": "ESPN, Inc.",
   "dws5:domain_created_timestamp": "781268400000",
   "method": "GET",
   "parallelenricher:enrich:begin:ts": "1566607252928",
   "user:user": "mmiklavcic",
   "adapter:simplehbaseadapter:end:ts": "1566607252925",
   "dws3:domain_created_timestamp": "781268400000",
   "dws2:domain_created_timestamp": "781268400000",
   "user:timestamp": 1566598784187,
   "dws2:registrar": "ESPN, Inc.",
   "user:source:type": "user",
   "dws4:domain_created_timestamp": "781268400000",
   "adapter:threatinteladapter:begin:ts": "1566607252928",
   "guid": "919b421a-b2ec-4e82-951e-3ee031c5a394",
   "dws3:owner": "ESPN, Inc.",
   "dws2:owner": "ESPN, Inc.",
   "code": 200,
   "adapter:stellaradapter:end:ts": "1566607252922",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:domain": "espn.com",
   "dws2:home_country": "US",
   "dws4:home_country": "US",
   "dws1:registrar": "ESPN, Inc.",
   "elapsed": 732,
   "source:type": "squid",
   "ip_dst_addr": "54.152.255.68",
   "dws5:registrar": "ESPN, Inc.",
   "domain_without_subdomains": "espn.com",
   "ip_src_addr": "127.0.0.1",
   "timestamp": 1566604971782,
   "adapter:stellaradapter:begin:ts": "1566607252906",
   "url": "https://www.espn.com/nfl/team/_/name/cle/cleveland-browns",
   "dws1:owner": "ESPN, Inc.",
   "parallelenricher:splitter:begin:ts": "1566607252928",
   "dws5:owner": "ESPN, Inc.",
   "user:guid": "d8fb60b7-1670-4f96-a413-cb185afbe0de",
   "bytes": 331562,
   "parallelenricher:splitter:end:ts": "1566607252928",
   "user:original_string": "mmiklavcic,127.0.0.1",
   "dws4:registrar": "ESPN, Inc.",
   "adapter:simplehbaseadapter:begin:ts": "1566607252906"
   }
   }
   ```
   
   #### Test recoverability with HBase down
   
   Now, again clear your squid index. 
   ```
   curl -XDELETE "http://node1:9200/squid*"
   ```
   
   Stop HBase and wait a few moments. Import the squid data again:
   ```
   cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid
   ```
   
   Wait about a minute and check your squid index. You should not see any new data in the index. Now, restart HBase again in Ambari. After HBase has restarted, check the squid index. After some amount of time, the data should be able to flow through enrichments and make it to the squid index.
   
   After completing the above steps you should not see any HBase exceptions or errors in the enrichment logs.
   
   ### Profiler
   
   Stop the profiler. In Ambari, set the profiler period duration to 1 minute via the Profiler config section.
   Adjust `$METRON_HOME/config/zookeeper/global.json` to adjust the capture duration:
   
   ```
   vim ${METRON_HOME}/config/zookeeper/global.json
   "profiler.client.period.duration" : "1",
   "profiler.client.period.duration.units" : "MINUTES",
   ```
   
   Create `$METRON_HOME/config/zookeeper/profiler.json` and save the following contents:
   ```
   {
     "profiles": [
       {
         "profile": "hello-world",
         "onlyif":  "exists(ip_dst_addr)",
         "foreach": "ip_dst_addr",
         "init":    { "count": "0" },
         "update":  { "count": "count + 1" },
         "result":  "count"
       }
     ]
   }
   ```
   
   Modify `${METRON_HOME}/config/zookeeper/enrichments/squid.json` so it pulls values from the profiler. Update our previous example to add the following Stellar enrichment "e7":
   ```
            "e6" : {
              "dws5" : "ENRICHMENT_GET('whois', domain_without_subdomains, 'enrichment', 't')"
            },
            "e7" : {
              "profile_for_ip_dst_addr" : "PROFILE_GET( 'hello-world', ip_dst_addr, PROFILE_FIXED(2, 'MINUTES'))"
            }
   ```
   
   
   Push your changes to Zookeeper
   ```
   ${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i ${METRON_HOME}/config/zookeeper
   ```
   
   Restart the profiler again.
   
   Clear your squid data
   ```
   curl -XDELETE "http://node1:9200/squid*"
   ```
   
   And publish some squid data to the squid topic for roughly 500 seconds. This is a somewhat arbitrary choice, but we want to give the profiles enough time to flush in order for the enrichments to start picking up the profile data from HBase.
   ```
   for in in {1..100}; do cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid; sleep 5; done
   ```
   
   Once this process completes, you should note the following:
   1. No errors/exceptions in the profiler or enrichment Storm logs
   2. 700 records get written to the Squid index in ES
   3. You should see many (not all, especially the early records) records written with non-empty values for field `profile_for_ip_dst_addr`. e.g.
       ```
       curl -XGET "http://node1:9200/squid*/_search?size=700&pretty=true" | grep -A 2 profile_for_ip_dst_addr
       ```
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services