You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by nickwallen <gi...@git.apache.org> on 2018/03/15 20:22:37 UTC

[GitHub] metron pull request #965: METRON-590 Enable Use of Event Time in Profiler

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/965

    METRON-590 Enable Use of Event Time in Profiler

    This enables the use of event time processing in the Profiler.
    
    By default, the Profiler will still use processing time.  If you configure the profiler with a `timestampField` then it will extract the timestamps from that field contained within the incoming telemetry.
    
    ## Manual Testing
    
    
    
    1. Launch a development environment.  Shutdown Indexing, Elasticsearch, Kibana, YARN, and MapReduce2 to avoid any resource issues.
    
    1. Using Ambari, change the following settings and restart the Profiler.
    
        Set the "Period Duration" to 1 minute.
        Set the "Window Duration" to 15 seconds.
        Set the "Window Lag" to 30 seconds.
    
    1. Replace `/opt/sensor-stubs/bin/start-bro-stub` with the following.
    
        Instead of adding the current time into each Bro message, this will add a timestamp from 1 day ago.
        ```
        #
        # how long to delay between each 'batch' in seconds.
        #
        DELAY=${1:-2}
    
        #
        # how many messages to send in each 'batch'.  the messages are drawn randomly
        # from the entire set of canned data.
        #
        COUNT=${2:-10}
    
        INPUT="/opt/sensor-stubs/data/bro.out"
        PRODUCER="/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh"
        TOPIC="bro"
    
        while true; do
    
          # transform the bro timestamp and push to kafka
          SEARCH="\"ts\"\:[0-9]\+\."
          REPLACE="\"ts\"\:`date -d '1 day ago' +'%s'`\."
          shuf -n $COUNT $INPUT | sed -e "s/$SEARCH/$REPLACE/g" | $PRODUCER --broker-list node1:6667 --topic $TOPIC
    
          sleep $DELAY
        done
        ```
    
    1. Restart the Bro Sensor Stub.
    
        ```
        service sensor-stubs stop
        service sensor-stubs start bro
        ```
    
    1. Open up the REPL and configure the Profiler like so.
    
        Notice that we are setting the 'timestampField' within the Profiler configuration.  This will tell the Profiler to extract the timestamp from this field rather than using system time.
        ```
        [Stellar]>>> conf := SHELL_EDIT(conf)
        {
          "profiles": [
            {
              "profile": "hello-world",
              "onlyif": "source.type == 'bro'",
              "foreach": "'global'",
              "init":    { "count": "0" },
              "update":  { "count": "count + 1" },
              "result":  "count"
            }
          ],
          "timestampField": "timestamp"
        }
        [Stellar]>>>
        [Stellar]>>>
        [Stellar]>>> CONFIG_PUT("PROFILER",conf)
        ```
    
    1. Query the Profiler data store.  This will take a minute or so until you see a value written.
    
        ```
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, "DAYS"))
        []
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, "DAYS"))
        [200]
        ```
    
    1. Now query back just a couple hours instead.  Notice that you should get no results.  This indicates that the Profiler successfully used the timestamp from the Bro data which contained day old values.
    
        ```
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, "HOURS"))
        []
        ```
    
    1. Now change the Profiler configuration to remove the "timestampField" setting.  This will switch the Profiler back to using system aka processing time.
    
        ```
        [Stellar]>>> conf := SHELL_EDIT(conf)
        {
          "profiles": [
            {
              "profile": "hello-world",
              "onlyif": "source.type == 'bro'",
              "foreach": "'global'",
              "init":    { "count": "0" },
              "update":  { "count": "count + 1" },
              "result":  "count"
            }
          ]
        }
        [Stellar]>>>
        [Stellar]>>> CONFIG_PUT("PROFILER",conf)
        ```
    
    1. The Profiler will pick-up the change after the next flush event.  Query for profile data in the past few minutes.  This shows that the Profiler has switched back to use system time aka processing time.
    
        ```
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, "MINUTES"))
        [180, 190]
        ```
    
    1. In Storm you can also set logging to DEBUG for "org.apache.metron.profiler". This will output detailed worker logs that allows you to also verify that the profiler is using the correct timestamps.
    
    
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron 
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-590-2018

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/965.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #965
    
----
commit 64343bc0d99880ac8bb17137a9226c3f44417da7
Author: Nick Allen <ni...@...>
Date:   2018-02-13T14:52:54Z

    METRON-590 Enable Use of Event Time in Profiler

----


---

[GitHub] metron pull request #965: METRON-590 Enable Use of Event Time in Profiler

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metron/pull/965


---

[GitHub] metron pull request #965: METRON-590 Enable Use of Event Time in Profiler

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/965#discussion_r175553846
  
    --- Diff: metron-analytics/metron-profiler/README.md ---
    @@ -328,6 +328,62 @@ Continuing the previous running example, at this point, you have seen how your p
     
     ## Anatomy of a Profile
     
    +### Profiler
    +
    +The Profiler configuration contains only two fields; only one of which is required.
    +
    +```
    +{
    +    "profiles": [
    +        { "profile": "one", ... },
    +        { "profile": "two", ... }
    +    ],
    +    "timestampField": "timestamp"
    +}
    +```
    +
    +| Name                              |               | Description
    +|---                                |---            |---
    +| [profiles](#profiles)             | Required      | A list of zero or more Profile definitions.
    +| [timestampField](#timestampfield) | Optional      | Indicates whether processing time or event time should be used.
    --- End diff --
    
    Can we indicate the default here in the description (for quicker reference)?


---

[GitHub] metron pull request #965: METRON-590 Enable Use of Event Time in Profiler

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on a diff in the pull request:

    https://github.com/apache/metron/pull/965#discussion_r175586013
  
    --- Diff: metron-analytics/metron-profiler/README.md ---
    @@ -328,6 +328,62 @@ Continuing the previous running example, at this point, you have seen how your p
     
     ## Anatomy of a Profile
     
    +### Profiler
    +
    +The Profiler configuration contains only two fields; only one of which is required.
    +
    +```
    +{
    +    "profiles": [
    +        { "profile": "one", ... },
    +        { "profile": "two", ... }
    +    ],
    +    "timestampField": "timestamp"
    +}
    +```
    +
    +| Name                              |               | Description
    +|---                                |---            |---
    +| [profiles](#profiles)             | Required      | A list of zero or more Profile definitions.
    +| [timestampField](#timestampfield) | Optional      | Indicates whether processing time or event time should be used.
    --- End diff --
    
    Sure. Update made.


---

[GitHub] metron issue #965: METRON-590 Enable Use of Event Time in Profiler

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/965
  
    +1, thanks!


---