You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by cestella <gi...@git.apache.org> on 2017/01/18 14:20:53 UTC

[GitHub] incubator-metron pull request #419: METRON-664: Make the index configuration...

GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/419

    METRON-664: Make the index configuration per-writer with enabled/disabled

    Currently the index configuration is per-sensor and the properties specified are identical for every writer.  Also, the ability to turn off a given writer for a given sensor is not available.
    
    This JIRA seeks to remedy that by:
    * Making the per-sensor indexing config have per-writer sections for the properties available to configure
    * Adding a new per-writer property `enabled` to indicate whether the writer is turned on or off (default on).
    
    Please see the `metron-indexing` documentation for examples of the new configs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron METRON-664

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #419
    
----
commit 5ee9295657a218235f4f38d5475693bdebab44f3
Author: cstella <ce...@gmail.com>
Date:   2017-01-17T00:12:59Z

    First cut.

commit 58af93ea50f1ac20ef61e232e125becaf4756a29
Author: cstella <ce...@gmail.com>
Date:   2017-01-17T16:21:56Z

    Updating to add warning message.

commit 46f806061e0a64da78c3539cec418abb78978875
Author: cstella <ce...@gmail.com>
Date:   2017-01-17T21:54:18Z

    Updating stellar management functions

commit 2c3d8dcd0b6c8a537a1d8439483f794cdd7400ca
Author: cstella <ce...@gmail.com>
Date:   2017-01-18T14:07:59Z

    Updated readme.

commit d0eea5db097701aff83e9df5dcee87da2b0724d8
Author: cstella <ce...@gmail.com>
Date:   2017-01-18T14:17:53Z

    Updating indexing functions and docs.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #419: METRON-664: Make the index configuration...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/419


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #419: METRON-664: Make the index configuration per-wr...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/419
  
    Tested this in quick-dev. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #419: METRON-664: Make the index configuration per-wr...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/419
  
    Testing Instructions beyond the normal smoke test (i.e. letting data
    flow through to the indices and checking them).
    
    ## Preliminaries
    
    Since I will use the squid topology to pass data through in a controlled
    way, we must install squid and generate one point of data:
    * `yum install -y squid`
    * `service squid start`
    * `squidclient http://www.yahoo.com`
    
    Also, set an environment variable to indicate `METRON_HOME`:
    * `export METRON_HOME=/usr/metron/0.3.0` 
    
    ## Free Up Space on the virtual machine
    
    First, let's free up some headroom on the virtual machine.  If you are running this on a
    multinode cluster, you would not have to do this.
    * Kill monit via `service monit stop`
    * Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print $2}');do kill -9 $i;done`
    * Kill existing parser topologies via 
       * `storm kill snort`
       * `storm kill bro`
    * Kill flume via `for i in $(ps -ef | grep flume | awk '{print $2}');do kill -9 $i;done`
    * Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9 $i;done`
    * Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9 $i;done`
    
    ## Deploy the squid parser
    * Create the squid kafka topic: `/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 --create --topic squid --partitions 1 --replication-factor 1`
    * Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z node1:2181 -s squid`
    
    ### Test Case 0: Base Case Test
    * Delete any squid index that currently exists (if any do) via `curl -XDELETE "http://localhost:9200/squid*"`
    * Send 1 data points through and ensure that there are no data points in the index:
      * `cat /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic squid`
      * `curl "http://localhost:9200/squid*/_search?pretty=true&q=*:*" 2> /dev/null| grep "full_hostname" | wc -l` should yield  `1`
    * Validate that the Storm UI for the indexing topology indicates a warning in the console for both the "hdfsIndexingBolt" and "indexingBolt" to the effect of `java.lang.Exception: WARNING: Default and (likely) unoptimized writer config used for hdfs writer and sensor squid` and `java.lang.Exception: WARNING: Default and (likely) unoptimized writer config used for elasticsearch writer and sensor squid` respectively 
    
    ### Test Case 1: Adjusting batch sizes independently
    * Delete any squid index that currently exists (if any do) via `curl -XDELETE "http://localhost:9200/squid*"`
    * Create a file at `$METRON_HOME/config/zookeeper/indexing/squid.json` with the following contents:
    ```
    {
      "hdfs" : {
        "index": "squid",
        "batchSize": 1,
        "enabled" : true
      },
      "elasticsearch" : {
        "index": "squid",
        "batchSize": 5,
        "enabled" : true
      }
    }
    ```
    * Push the configs via `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z node1:2181`
    * Send 4 data points through and ensure:
      * `cat /var/log/squid/access.log /var/log/squid/access.log /var/log/squid/access.log /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic squid`
      * `curl "http://localhost:9200/squid*/_search?pretty=true&q=*:*" 2> /dev/null| grep "full_hostname" | wc -l` should yield  `0` 
      * `hadoop fs -cat /apps/metron/indexing/indexed/squid/enrichment-null* | wc -l` should yield `4`
    * Send a final data point through and ensure:
      * `cat /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic squid`
      * `curl "http://localhost:9200/squid*/_search?pretty=true&q=*:*" 2> /dev/null| grep "full_hostname" | wc -l` should yield  `5` 
      * `hadoop fs -cat /apps/metron/indexing/indexed/squid/enrichment-null* | wc -l` should yield `5`
     
    ### Test Case 2: Turn off HDFS writer
    * Delete any squid index that currently exists (if any do) via `curl -XDELETE "http://localhost:9200/squid*"`
    * Edit the file at `$METRON_HOME/config/zookeeper/indexing/squid.json` to the following contents:
    ```
    {
      "hdfs" : {
        "index": "squid",
        "batchSize": 1,
        "enabled" : false 
      },
      "elasticsearch" : {
        "index": "squid",
        "batchSize": 1,
        "enabled" : true
      }
    }
    ```
    * Send 1 data points through and ensure:
      * `cat /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic squid`
      * `curl "http://localhost:9200/squid*/_search?pretty=true&q=*:*" 2> /dev/null| grep "full_hostname" | wc -l` should yield  `1`
      * `hadoop fs -cat /apps/metron/indexing/indexed/squid/enrichment-null* | wc -l` should yield `0`
    
    ### Test Case 3: Stellar Management Functions
    * Execute the following in the stellar shell:
    ```
    Stellar, Go!
    Please note that functions are loading lazily in the background and will be unavailable until loaded fully.
    {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH}
    [Stellar]>>> # Grab the indexing config
    [Stellar]>>> squid_config := CONFIG_GET('INDEXING', 'squid', true)
    [Stellar]>>>
    [Stellar]>>> # Update the index and batch size
    [Stellar]>>> squid_config := INDEXING_SET_BATCH( INDEXING_SET_INDEX(squid_config, 'hdfs', 'squid'), 'hdfs', 2)
    [Stellar]>>> # Push the config to zookeeper
    [Stellar]>>> CONFIG_PUT('INDEXING', squid_config, 'squid')
    [Stellar]>>> # Grab the updated config from zookeeper
    [Stellar]>>> CONFIG_GET('INDEXING', 'squid')
    {
      "hdfs" : {
        "index" : "squid",
        "batchSize" : 2,
        "enabled" : false
      },
      "elasticsearch" : {
        "index" : "squid",
        "batchSize" : 1,
        "enabled" : true
      }
    }
    ```
    * Confirm that the dump command from `$METRON_HOME/bin/zk_load_configs.sh -m DUMP -z node1:2181` contains the config with batch size of `1`
    * Now pull the configs locally via `$METRON_HOME/bin/zk_load_configs.sh -m PULL -z node1:2181 -o $METRON_HOME/config/zookeeper -f`
    * Check that the "hdfs" config at `$METRON_HOME/config/zookeeper/indexing/squid.json` is indeed:
    ```
    {
      "index" : "squid",
      "batchSize" : 2,
      "enabled" : false
    }
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---