You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by nickwallen <gi...@git.apache.org> on 2018/12/06 19:10:26 UTC

[GitHub] metron pull request #1292: METRON-1925 Provide Verbose View of Profile Resul...

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/1292

    METRON-1925 Provide Verbose View of Profile Results in REPL

    ## Motivation
    
    When viewing profile measurements in the REPL using PROFILE_GET, you simply get back a list of values. It is not easy to determine from which time period the measurements were taken.
    
    For example, are the following values all sequential?  Are there any gaps in the measurements taken over the past 30 minutes?  When was the first measurement taken?
    ```
    [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES"))
    [2655, 1170, 1185, 1170, 1185, 1215, 1200, 1170]
    ```
    
    The `PROFILE_GET` function was designed to return values that serve as input to other functions.  It was not designed to return values in a human-readable form that can be easily understood. We need another way to query for profile measurements in the REPL that provides a user with a better understanding of the profile measurements.
    
    ## Solution
    
    This PR provides a new function called `PROFILE_VIEW`.  It is effectively a "verbose mode" for `PROFILE_GET`.  
    
    For lack of a better name, I just called it `PROFILE_VIEW`.  I would be open to alternatives.  I did not want to add additional options to the already complex `PROFILE_GET`.
    
    * Description: Retrieves a series of measurements from a stored profile. Provides a more verbose view of each measurement than PROFILE_GET. Returns a map containing the profile name, entity, period id, period start, period end for each profile measurement.
    
    * Arguments:
            profile - The name of the profile.
    	entity - The name of the entity.
    	periods - The list of profile periods to fetch. Use PROFILE_WINDOW or PROFILE_FIXED.
    	groups - Optional, The groups to retrieve. Must correspond to the 'groupBy' list used during profile creation. Defaults to an empty list, meaning no groups.
    
    * Returns: A map for each profile measurement containing the profile name, entity, period, and value.
    
    ## Test Drive
    
    1. Spin-up Full Dev and create a profile.  Follow the Profiler README.  Reduce the profile period if you are impatient.
    
    1. Open up the REPL and retrieve the values using `PROFILE_GET`.  Notice that I have no idea when the first measurement was taken, if the values are sequential, if there are gaps in the values and how big.
        ```
        [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES"))
        [1185, 1170, 1185, 1215, 1200, 1170, 5425, 1155, 1215, 1200]
        ```
    
    1. Now use `PROFILE_VIEW` to retrieve the same results.
        ```
        [Stellar]>>> results := PROFILE_VIEW("hello-world","global",PROFILE_FIXED(30, "MINUTES"))
        [{period.start=1544119560000, period=12867663, profile=hello-world, period.end=1544119680000, groups=[], value=1185, entity=global}, {period.start=1544119680000, period=12867664, profile=hello-world, period.end=1544119800000, groups=[], value=1170, entity=global}, {period.start=1544119800000, period=12867665, profile=hello-world, period.end=1544119920000, groups=[], value=1185, entity=global}, {period.start=1544119920000, period=12867666, profile=hello-world, period.end=1544120040000, groups=[], value=1215, entity=global}, {period.start=1544120040000, period=12867667, profile=hello-world, period.end=1544120160000, groups=[], value=1200, entity=global}, {period.start=1544120160000, period=12867668, profile=hello-world, period.end=1544120280000, groups=[], value=1170, entity=global}, {period.start=1544120880000, period=12867674, profile=hello-world, period.end=1544121000000, groups=[], value=5425, entity=global}, {period.start=1544121000000, period=12867675, profile=hello-world
 , period.end=1544121120000, groups=[], value=1155, entity=global}, {period.start=1544121120000, period=12867676, profile=hello-world, period.end=1544121240000, groups=[], value=1215, entity=global}, {period.start=1544121240000, period=12867677, profile=hello-world, period.end=1544121360000, groups=[], value=1200, entity=global}]
        ```
    
    1. For each measurement, I have a map containing the period, period start, period end, profile name, entity, groups, and value.  With this I can better answer some of the questions above.
        ```
        { 
          profile=hello-world, 
          entity=global,
          period=12867663,  
          period.start=1544119560000, 
          period.end=1544119680000, 
          groups=[], 
          value=1185
        }
        ```
    
    1. When was the first measurement taken?
        ```
        [Stellar]>>> GET(results, 0)
        {period.start=1544119560000, period=12867663, profile=hello-world, period.end=1544119680000, groups=[], value=1185, entity=global}
        ```
        I can see that the first period started at 1544119560000 or Thu Dec 06 2018 18:06:00 UTC.
    
    1. Are these measurements sequential?  Are there any gaps?
    
        I can use `MAP` on the list of results to extract just the periods from each profile measurement. The period is a monotonically increasing value. 
        ```
        [Stellar]>>> MAP(results, m -> MAP_GET("period", m))
        [12867663, 12867664, 12867665, 12867666, 12867667, 12867668, 12867674, 12867675, 12867676, 12867677]
        ```
        From this I can tell that there is a gap in the measurements.  
        * The first period here is 12867663 and the last is 12867677. 
        * I can see that the first 6 measurements are sequential up to period 12867668.  
        * Then there is a gap of 5 periods (12867669 - 12867673) before the measurements resume at period 12867674.
    
    1. How big was that gap?
    
        We know that we missed 5 periods, but how big is that gap?
    
        Here is when each period starts in epoch milliseconds.
        ```
        [Stellar]>>> MAP(results, m -> MAP_GET("period.start", m))
        [1544119560000, 1544119680000, 1544119800000, 1544119920000, 1544120040000, 1544120160000, 1544120880000, 1544121000000, 1544121120000, 1544121240000]
        ```
    
        I can find the difference between the 5th and 6th period.  The different is about 720000 milliseconds, which is 12 minutes.
        ```
        [Stellar]>>> MAP_GET("period.start", GET(results, 5))
        1544120160000
    
        [Stellar]>>> MAP_GET("period.start", GET(results, 6))
        1544120880000
    
        [Stellar]>>> MAP_GET("period.start", GET(results, 6)) - MAP_GET("period.start", GET(results, 5))
        720000
        ```
    
    ## Changes
    
    * Altered the ProfilerClient so that it returns the `ProfileMeasurement` values instead of just a list of the raw values stored in HBase. 
    
    * Removed one of the methods of ProfileClient that was not being used to simplify things.
    
    * Updated `PROFILE_GET` to work with the altered ProfilerClient.
    
    * Added `PROFILE_VIEW`.
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1925

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1292
    
----
commit 8a058e6274d712ed046325ccb408063f8ac9edbf
Author: Nick Allen <ni...@...>
Date:   2018-12-04T13:54:45Z

    METRON-1925 Provide Verbose View of Profile Results in REPL

----


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    > The underlying logic seems like it should be nearly identical. Is there any common functionality that could be pulled out and shared between the two?
    
    All of the heavy lifting was already done by the HBaseProfilerClient.  So they already share a common implementation through that.  And that is why you don't see a ton of change needed in `PROFILE_GET`.


---

[GitHub] metron pull request #1292: METRON-1925 Provide Verbose View of Profile Resul...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen closed the pull request at:

    https://github.com/apache/metron/pull/1292


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    Not sure where this is coming from:
    ```
    Failed tests: 
      RestFunctionsTest.restGetShouldTimeout:516 expected null, but was:<{get=success}>
    ```


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    > Could the return be a full json document, that includes the query parameters? 
    
    what do you mean by query parameters?


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    @ottobackwards Maybe another optional array could be used, similar to the geoget functions, that would allow you to specify a list of desired fields to return.
    
    https://github.com/apache/metron/blob/c08cd07f36cd9bf2608a586a209bf809130a069a/metron-platform/metron-enrichment/src/test/java/org/apache/metron/enrichment/stellar/GeoEnrichmentFunctionsTest.java#L140


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    Could the return be a full json document, that includes the query parameters? I can see  doing these things and writing to file, and wanting more than just the data, but having the meta data ( query ) as well.



---

[GitHub] metron pull request #1292: METRON-1925 Provide Verbose View of Profile Resul...

Posted by nickwallen <gi...@git.apache.org>.
GitHub user nickwallen reopened a pull request:

    https://github.com/apache/metron/pull/1292

    METRON-1925 Provide Verbose View of Profile Results in REPL

    ## Motivation
    
    When viewing profile measurements in the REPL using PROFILE_GET, you simply get back a list of values. It is not easy to determine from which time period the measurements were taken.
    
    For example, are the following values all sequential?  Are there any gaps in the measurements taken over the past 30 minutes?  When was the first measurement taken?
    ```
    [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES"))
    [2655, 1170, 1185, 1170, 1185, 1215, 1200, 1170]
    ```
    
    The `PROFILE_GET` function was designed to return values that serve as input to other functions.  It was not designed to return values in a human-readable form that can be easily understood. We need another way to query for profile measurements in the REPL that provides a user with a better understanding of the profile measurements.
    
    ## Solution
    
    This PR provides a new function called `PROFILE_VIEW`.  It is effectively a "verbose mode" for `PROFILE_GET`.  
    
    For lack of a better name, I just called it `PROFILE_VIEW`.  I would be open to alternatives.  I did not want to add additional options to the already complex `PROFILE_GET`.
    
    * Description: Retrieves a series of measurements from a stored profile. Provides a more verbose view of each measurement than PROFILE_GET. Returns a map containing the profile name, entity, period id, period start, period end for each profile measurement.
    
    * Arguments:
            profile - The name of the profile.
    	entity - The name of the entity.
    	periods - The list of profile periods to fetch. Use PROFILE_WINDOW or PROFILE_FIXED.
    	groups - Optional, The groups to retrieve. Must correspond to the 'groupBy' list used during profile creation. Defaults to an empty list, meaning no groups.
    
    * Returns: A map for each profile measurement containing the profile name, entity, period, and value.
    
    ## Test Drive
    
    1. Spin-up Full Dev and create a profile.  Follow the Profiler README.  Reduce the profile period if you are impatient.
    
    1. Open up the REPL and retrieve the values using `PROFILE_GET`.  Notice that I have no idea when the first measurement was taken, if the values are sequential, if there are gaps in the values and how big.
        ```
        [Stellar]>>> PROFILE_GET("hello-world","global",PROFILE_FIXED(30, "MINUTES"))
        [1185, 1170, 1185, 1215, 1200, 1170, 5425, 1155, 1215, 1200]
        ```
    
    1. Now use `PROFILE_VIEW` to retrieve the same results.
        ```
        [Stellar]>>> results := PROFILE_VIEW("hello-world","global",PROFILE_FIXED(30, "MINUTES"))
        [{period.start=1544119560000, period=12867663, profile=hello-world, period.end=1544119680000, groups=[], value=1185, entity=global}, {period.start=1544119680000, period=12867664, profile=hello-world, period.end=1544119800000, groups=[], value=1170, entity=global}, {period.start=1544119800000, period=12867665, profile=hello-world, period.end=1544119920000, groups=[], value=1185, entity=global}, {period.start=1544119920000, period=12867666, profile=hello-world, period.end=1544120040000, groups=[], value=1215, entity=global}, {period.start=1544120040000, period=12867667, profile=hello-world, period.end=1544120160000, groups=[], value=1200, entity=global}, {period.start=1544120160000, period=12867668, profile=hello-world, period.end=1544120280000, groups=[], value=1170, entity=global}, {period.start=1544120880000, period=12867674, profile=hello-world, period.end=1544121000000, groups=[], value=5425, entity=global}, {period.start=1544121000000, period=12867675, profile=hello-world
 , period.end=1544121120000, groups=[], value=1155, entity=global}, {period.start=1544121120000, period=12867676, profile=hello-world, period.end=1544121240000, groups=[], value=1215, entity=global}, {period.start=1544121240000, period=12867677, profile=hello-world, period.end=1544121360000, groups=[], value=1200, entity=global}]
        ```
    
    1. For each measurement, I have a map containing the period, period start, period end, profile name, entity, groups, and value.  With this I can better answer some of the questions above.
        ```
        { 
          profile=hello-world, 
          entity=global,
          period=12867663,  
          period.start=1544119560000, 
          period.end=1544119680000, 
          groups=[], 
          value=1185
        }
        ```
    
    1. When was the first measurement taken?
        ```
        [Stellar]>>> GET(results, 0)
        {period.start=1544119560000, period=12867663, profile=hello-world, period.end=1544119680000, groups=[], value=1185, entity=global}
        ```
        I can see that the first period started at 1544119560000 or Thu Dec 06 2018 18:06:00 UTC.
    
    1. Are these measurements sequential?  Are there any gaps?
    
        I can use `MAP` on the list of results to extract just the periods from each profile measurement. The period is a monotonically increasing value. 
        ```
        [Stellar]>>> MAP(results, m -> MAP_GET("period", m))
        [12867663, 12867664, 12867665, 12867666, 12867667, 12867668, 12867674, 12867675, 12867676, 12867677]
        ```
        From this I can tell that there is a gap in the measurements.  
        * The first period here is 12867663 and the last is 12867677. 
        * I can see that the first 6 measurements are sequential up to period 12867668.  
        * Then there is a gap of 5 periods (12867669 - 12867673) before the measurements resume at period 12867674.
    
    1. How big was that gap?
    
        We know that we missed 5 periods, but how big is that gap?
    
        Here is when each period starts in epoch milliseconds.
        ```
        [Stellar]>>> MAP(results, m -> MAP_GET("period.start", m))
        [1544119560000, 1544119680000, 1544119800000, 1544119920000, 1544120040000, 1544120160000, 1544120880000, 1544121000000, 1544121120000, 1544121240000]
        ```
    
        I can find the difference between the 5th and 6th period.  The different is about 720000 milliseconds, which is 12 minutes.
        ```
        [Stellar]>>> MAP_GET("period.start", GET(results, 5))
        1544120160000
    
        [Stellar]>>> MAP_GET("period.start", GET(results, 6))
        1544120880000
    
        [Stellar]>>> MAP_GET("period.start", GET(results, 6)) - MAP_GET("period.start", GET(results, 5))
        720000
        ```
    
    ## Changes
    
    * Altered the ProfilerClient so that it returns the `ProfileMeasurement` values instead of just a list of the raw values stored in HBase. 
    
    * Removed one of the methods of ProfileClient that was not being used to simplify things.
    
    * Updated `PROFILE_GET` to work with the altered ProfilerClient.
    
    * Added `PROFILE_VIEW`.
    
    ## Pull Request Checklist
    
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [x] Have you included steps or a guide to how the change may be verified and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
    - [x] Have you written or updated unit tests and or integration tests to verify your changes?
    - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1925

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1292
    
----
commit 8a058e6274d712ed046325ccb408063f8ac9edbf
Author: Nick Allen <ni...@...>
Date:   2018-12-04T13:54:45Z

    METRON-1925 Provide Verbose View of Profile Results in REPL

commit 8e4813d7404f894ca5f06a57a0aba015d6296536
Author: Nick Allen <ni...@...>
Date:   2018-12-08T16:22:17Z

    Merge remote-tracking branch 'apache/master' into METRON-1925

----


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    This seems pretty useful, thanks for the submission @nickwallen.
    
    > For lack of a better name, I just called it PROFILE_VIEW. I would be open to alternatives. I did not want to add additional options to the already complex PROFILE_GET.
    
    Heh, that was going to be my first question. I think simplifying the client-facing function signatures, as you've done, makes sense. I don't see any modifications to the existing PROFILE_GET function, though. The underlying logic seems like it should be nearly identical. Is there any common functionality that could be pulled out and shared between the two?


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    @nickwallen What I mean is that the returned value has the query parameters in it, so you have the data and the query you used for it.  Please excuse me if that is already the case


---

[GitHub] metron issue #1292: METRON-1925 Provide Verbose View of Profile Results in R...

Posted by JonZeolla <gi...@git.apache.org>.
Github user JonZeolla commented on the issue:

    https://github.com/apache/metron/pull/1292
  
    This should get added to the README.


---