You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by nickwallen <gi...@git.apache.org> on 2018/04/20 20:27:35 UTC

[GitHub] metron pull request #1000: METRON-1533 Create KAFKA_FIND Stellar Function

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/1000

    METRON-1533 Create KAFKA_FIND Stellar Function

    I created a `KAFKA_FIND` function that allows you to provide a filter expression so that only messages satisfying a condition are returned.   For example...
    
    - Find a message that has been enriched with geolocation data.
        ```
        KAFKA_FIND('indexing', m -> MAP_EXISTS('geo.city', m))
        ```
    
    - Find a Bro message.
        ```
        KAFKA_FIND('indexing', m -> MAP_GET('source.type', m) == 'bro')
        ```
    
    ## Use Case
    
    When creating enrichments, I often find that I want to validate that the enrichment I just created was successful on the live, incoming stream of telemetry. My workflow looks something like this.
    
    1. Create and test the enrichment that I want to create.
        ```
        [Stellar]>>> ip_src_addr := "72.34.49.86"
        72.34.49.86
    
        [Stellar]>>> geo := GEO_GET(ip_src_addr)
        {country=US, dmaCode=803, city=Los Angeles, postalCode=90014, latitude=34.0438, location_point=34.0438,-118.2512, locID=5368361, longitude=-118.2512}
        ```
    
    2. That looks good to me. Now let's add that to my Bro telemetry.
        ```
        [Stellar]>>> conf := SHELL_EDIT(conf)
        {
          "enrichment" : {
            "fieldMap": {
              "stellar": {
                "config": [
                   "geo := GEO_GET(ip_src_addr)"
                ]
              }
            }
          },
          "threatIntel": {
          }
        }
    
        [Stellar]>>> CONFIG_PUT("ENRICHMENTS", e, "bro")
         ```
    
    3.  It looks like that worked, but did that really work?
    
        At this point, I would run KAFKA_GET as many times as it takes to retrieve a Bro message. You would just have to get lucky and hope that the enrichment worked and secondly that you would pull down a Bro message (as opposed to a different sensor).
    
        I would rather have a function that lets me only pull back the messages that I care about. In this case I could either retrieve only Bro messages.
        ```
        KAFKA_FIND('indexing', m -> MAP_GET('source.type', m) == 'bro')
        ```
    
        Or I could look for messages that contain geolocation data.
        ```
        KAFKA_FIND('indexing', m -> MAP_EXISTS('geo.city', m))
        ```
    
    ### Changes
    
    * Created the `KAFKA_FIND` function along with unit tests.
    
    * Defined the global property `bootstrap.servers` by default during the MPack install.  This allows all of the `KAKFA_*` functions to work out-of-the-box.  Previously, a user would have to manually define this value before using any of the `KAFKA_*` functions.
    
    ###  Pull Request Checklist
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1533

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1000.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1000
    
----
commit 5cc3cc5d541e95a5fcb9a49eed6b291a48e6cb59
Author: Nick Allen <ni...@...>
Date:   2018-04-20T20:18:16Z

    METRON-1533 Create KAFKA_FIND Stellar Function

----


---

[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1000
  
    I made a bunch of enhancements based on the feedback I outlined above.  I am in the process of breaking that work out into multiple PRs so that it can be reviewed more easily.


---

[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1000
  
    Thanks for taking it for a test drive.  I think all your observations are explainable, but they all point out usability issues that I think I can improve on.
    
    #### 1. Offsets
    
    `KAFKA_FIND` 'sticks' on its consumer offset.  It operates more like `KAFKA_GET` than `KAFKA_TAIL`.  This is how I described it in the docs.
    
    > Finds messages that satisfy a given filter expression. Subsequent calls will continue retrieving messages sequentially from the original offset.
    
    When you first run `KAFKA_FIND`, its consumer offset will not be set.  It will pick-up from the end of the topic.  When you run it again in the same session, it will continue filtering from those same offsets, rather than going to the end of the topic.  
    
    The `kafka-console-consumer` tool always seeks to the end when it is run.  In your test its likely that `kafka-console-consumer` and `KAFKA_FIND` are at completely different offsets as you try to compare the two.
    
    I had actually already been working on a version of this that always seeks to the end and so behaves more like `KAFKA_TAIL` and `kafka-console-consumer`.
    
    Per the use case I described in the PR, I think 'seek to end' makes more sense.  You make a change on a live stream and want to see the immediate results.  If `KAFKA_TAIL` 'sticks' on an earlier offset, you're not going to see the most recent messages, which can be confusing for the user.
    
    #### 2. Timeouts
    
    > How long will this command listen until it times out (or is it based on number of messages read)? ...  Is this configurable?
    
    The command will poll for up to 5 seconds, by default.  This can be adjusted by defining a global property `stellar.kafka.max.wait`.
    
    > Sometimes it returned an empty array immediately. 
    
    In this case, it probably pulled in messages from the topic, none of those messages matched your filter, and so returned an empty array to you.
    
    I probably need to look at the timeout logic under these conditions.  It should probably 'try harder' to find matching messages and not return immediately.  I'll take a look at this and see if it can be improved.
    
    
    
    



---

[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

Posted by merrimanr <gi...@git.apache.org>.
Github user merrimanr commented on the issue:

    https://github.com/apache/metron/pull/1000
  
    I should add that I have both bro and snort parser topologies running.


---

[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

Posted by merrimanr <gi...@git.apache.org>.
Github user merrimanr commented on the issue:

    https://github.com/apache/metron/pull/1000
  
    I tested this in full dev and the results were somewhat inconsistent.  I listened on the enrichments topic with the kafka-console-consumer tool in one window:
    ```
    /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh -z node1:2181 --topic enrichments
    ```
    While repeatedly running this command in another:
    ```
    KAFKA_FIND('enrichments', m -> MAP_GET('source.type', m) == 'snort')
    ```
    About 25-50% of the time the Stellar shell returned `[]` and the other times it would return a snort message as expected.
    
    How long will this command listen until it times out (or is it based on number of messages read)?  Sometimes it returned an empty array immediately.  Is this configurable?  


---

[GitHub] metron pull request #1000: METRON-1533 Create KAFKA_FIND Stellar Function

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen closed the pull request at:

    https://github.com/apache/metron/pull/1000


---