You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/18 19:45:01 UTC

[jira] [Commented] (METRON-1120) Profile's 'groupBy' Expression Has No Reference to Time

    [ https://issues.apache.org/jira/browse/METRON-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133543#comment-16133543 ] 

ASF GitHub Bot commented on METRON-1120:
----------------------------------------

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/708

    Metron 1120

    [METRON-1120](https://issues.apache.org/jira/browse/METRON-1120)
    
    - [ ] This is built on top of METRON-1120 so this should not be committed before METRON-1120.
    
    The `groupBy` expression can now reference any of these variables.
    * `profile` The name of the profile.
    * `entity` The name of the entity being profiled.
    * `start` The start time of the profile period in epoch milliseconds.
    * `end` The end time of the profile period in epoch milliseconds.
    * `duration` The duration of the profile period in milliseconds.
    * `result` The result of executing the `result` expression.
    
    Unit tests have been added to validate this functionality. The README has also been updated to describe the fields available to the `groupBy` expression.
    
    This can also be tested manually in either a live Profiler or using the Profiler debugging functions. 
     The following shows how this change would be used to implement the problematic profile described in.
    
    Create a profile that references the start of the profile period in the `groupBy` expression.
    ```
    [Stellar]>>> conf := SHELL_EDIT()
    [Stellar]>>> conf
    {
      "profiles": [
        {
          "profile": "calender-effects",
          "onlyif":  "exists(ip_src_addr) and exists(timestamp)",
          "foreach": "ip_src_addr",
          "init":    { "count": 0 },
          "update":  { "count": "count + 1" },
          "result":  "count",
          "groupBy": ["DAY_OF_WEEK(start)"]
        }
      ]
    }
    ```
    
    Create a message to exercise the profiler.
    ```
    [Stellar]>>> msg := SHELL_EDIT()
    [Stellar]>>> msg
    {
    	"ip_src_addr":"10.0.0.1",
    	"timestamp":"2017-08-18 09:00:00"
    }
    ```
    
    Create a Profiler and apply the messages to it.
    ```
    [Stellar]>>> p := PROFILER_INIT(conf)
    [Stellar]>>> PROFILER_APPLY(msg, p)
    org.apache.metron.profiler.StandAloneProfiler@4572b5b4
    [Stellar]>>> PROFILER_APPLY(msg, p)
    org.apache.metron.profiler.StandAloneProfiler@4572b5b4
    [Stellar]>>> PROFILER_APPLY(msg, p)
    org.apache.metron.profiler.StandAloneProfiler@4572b5b4
    ```
    
    Flush the profile and validate the result of executing the `groupBy`.  The value is 6, which indicates Friday, which is correct in this case.
    ```
    [Stellar]>>> PROFILER_FLUSH(p)
    [{period={duration=900000, period=1670094, start=1503084600000, end=1503085500000}, profile=calender-effects, groups=[6], value=3, entity=10.0.0.1}]
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1120

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/708.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #708
    
----
commit 5579748ad4336a7c1a15c319d59fd6cbdeb6531e
Author: Nick Allen <ni...@nickallen.org>
Date:   2017-08-18T17:37:01Z

    METRON-1121 Ignore Profile with Bad 'init', 'update' or 'groupBy'

commit 893b7db84f155ea6af975ee51338f39b763eaedb
Author: Nick Allen <ni...@nickallen.org>
Date:   2017-08-18T17:45:50Z

    Rm errant comment

commit da365c8b546678bbe07011e10ab3cd222faa8297
Author: Nick Allen <ni...@nickallen.org>
Date:   2017-08-18T19:01:26Z

    METRON-1120 Profile's 'groupBy' Expression Has No Reference to Time

commit 5d8a7a06096d5aa725a0ce3b47fef36a8e14ac72
Author: Nick Allen <ni...@nickallen.org>
Date:   2017-08-18T19:04:35Z

    Rm artifacts that should not be in Git

commit c52dce2be6146127eed9af0d2b311ff65f0de551
Author: Nick Allen <ni...@nickallen.org>
Date:   2017-08-18T19:32:07Z

    Updated README

commit 54f1c5969268032e0841f0dd4b5e76449b8b3b6f
Author: Nick Allen <ni...@nickallen.org>
Date:   2017-08-18T19:35:31Z

    Fix README

----


> Profile's 'groupBy' Expression Has No Reference to Time
> -------------------------------------------------------
>
>                 Key: METRON-1120
>                 URL: https://issues.apache.org/jira/browse/METRON-1120
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>
> It is often the case that patterns and behaviors will differ based on calendar effects like day of week. For example, activity on a weekday can be very different from a weekend. The Profiler's "Group By" functionality is one way to account for calendar effects.
> This profile definition operates over any incoming telemetry that has an `ip_src_addr` and a `timestamp` field. It produces a profile that segments the data by day of week. It does by using a 'groupBy' expression to extract the day of week from the telemetry's `timestamp` field.
> {code}
> {
>   "profiles": [
>     {
>       "profile": "calender-effects",
>       "onlyif":  "exists(ip_src_addr) and exists(timestamp)",
>       "foreach": "ip_src_addr",
>       "init":    { "count": 0 },
>       "update":  { "count": "count + 1" },
>       "result":  "count",
>       "groupBy": ["DAY_OF_WEEK(TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', 'GMT'))"]
>     }
>   ]
> }
> {code}
> When retrieving profile data using the Profiler Client API, I only want to retrieve data from the same day of week to account for any calendar effects. The following example retrieves profile data only for Thursdays over the past 60 days.
> {code}
> >>> thursday := 5
> >>> PROFILE_GET("calendar-effects", "10.0.0.1", PROFILE_FIXED(60, "DAYS"), [thursday])
> {code}
> h3. The Problem
> The `groupBy` expression only has access to the Profile's `result` value.  It does not have any way to reference the current tick time in the Profiler.  Here is an example showing the problem.
> Define the profile and a message.
> {code}
> [Stellar]>>> conf
> {
>   "profiles": [
>     {
>       "profile": "calender-effects",
>       "onlyif":  "exists(ip_src_addr) and exists(timestamp)",
>       "foreach": "ip_src_addr",
>       "init":    { "count": "0" },
>       "update":  { "count": "count + 1" },
>       "result":  "count",
>       "groupBy": ["DAY_OF_WEEK(TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', 'GMT'))"]
>     }
>   ]
> }
> [Stellar]>>> msg
> {
>      "ip_src_addr": "10.0.0.1",
>      "protocol": "HTTPS",
>      "length": "10",
>      "bytes_in": 234,
>      "timestamp": "2017-08-17 09:00:00"
> }
> {code}
> Initialize the Profiler and apply the message a few times.
> {code}
> [Stellar]>>> p := PROFILER_INIT(conf)
> [Stellar]>>> PROFILER_APPLY(msg, p)
> org.apache.metron.profiler.StandAloneProfiler@9472c85
> [Stellar]>>> PROFILER_APPLY(msg, p)
> org.apache.metron.profiler.StandAloneProfiler@9472c85
> [Stellar]>>> PROFILER_APPLY(msg, p)
> org.apache.metron.profiler.StandAloneProfiler@9472c85
> {code}
> Flush the profile, which will trigger execution of the `groupBy` expression.
> {code}
> [Stellar]>>> PROFILER_FLUSH(p)
> [!] Bad 'groupBy' expression: Unexpected type: expected=Object, actual=null, expression=DAY_OF_WEEK(TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', 'GMT')), profile=calender-effects, entity=10.0.0.1
> org.apache.metron.stellar.dsl.ParseException: Bad 'groupBy' expression: Unexpected type: expected=Object, actual=null, expression=DAY_OF_WEEK(TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', 'GMT')), profile=calender-effects, entity=10.0.0.1
> 	at org.apache.metron.profiler.DefaultProfileBuilder.execute(DefaultProfileBuilder.java:257)
> 	at org.apache.metron.profiler.DefaultProfileBuilder.flush(DefaultProfileBuilder.java:159)
> 	at org.apache.metron.profiler.DefaultMessageDistributor.lambda$flush$0(DefaultMessageDistributor.java:101)
> 	at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:114)
> 	at org.apache.metron.profiler.DefaultMessageDistributor.flush(DefaultMessageDistributor.java:99)
> 	at org.apache.metron.profiler.StandAloneProfiler.flush(StandAloneProfiler.java:82)
> 	at org.apache.metron.profiler.client.stellar.ProfilerFunctions$ProfilerFlush.apply(ProfilerFunctions.java:191)
> 	at org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:556)
> 	at org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:160)
> 	at org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:152)
> 	at org.apache.metron.stellar.common.shell.StellarExecutor.execute(StellarExecutor.java:287)
> 	at org.apache.metron.stellar.common.shell.StellarShell.handleStellar(StellarShell.java:270)
> 	at org.apache.metron.stellar.common.shell.StellarShell.execute(StellarShell.java:409)
> 	at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Unexpected type: expected=Object, actual=null, expression=DAY_OF_WEEK(TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', 'GMT'))
> 	at org.apache.metron.stellar.common.DefaultStellarStatefulExecutor.execute(DefaultStellarStatefulExecutor.java:128)
> 	at org.apache.metron.profiler.DefaultProfileBuilder.lambda$execute$3(DefaultProfileBuilder.java:253)
> 	at java.util.ArrayList.forEach(ArrayList.java:1249)
> 	at org.apache.metron.profiler.DefaultProfileBuilder.execute(DefaultProfileBuilder.java:253)
> 	... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)