You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by mmiklavc <gi...@git.apache.org> on 2016/12/15 21:05:29 UTC

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

GitHub user mmiklavc opened a pull request:

    https://github.com/apache/incubator-metron/pull/397

    METRON-627: Add HyperLogLogPlus implementation to Stellar

    This PR addresses https://issues.apache.org/jira/browse/METRON-627
    
    Leverages the HLLP implementation from https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
    
    4 new Stellar functions have been added that allow a user to initialize a cardinality estimator, add items, merge estimators, and calculate cardinality estimates.
    
    ### `HLLP_CARDINALITY`
      * Description: Returns HyperLogLogPlus-estimated cardinality for this set
      * Input:
        * hyperLogLogPlus - the hllp set
      * Returns: Long value representing the cardinality for this set
    
    ### `HLLP_INIT`
      * Description: Initializes the set
      * Input:
        * p (required) - the precision value for the normal set
        * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
      * Returns: A new HyperLogLogPlus set
    
    ### `HLLP_MERGE`
      * Description: Merge hllp sets together
      * Input:
        * hllp1 - first hllp set
        * hllp2 - second hllp set
        * hllpn - additional sets to merge
      * Returns: A new merged HyperLogLogPlus estimator set
    
    ### `HLLP_OFFER`
      * Description: Add value to the set
      * Input:
        * hyperLogLogPlus - the hllp set
        * o - Object to add to the set
      * Returns: The HyperLogLogPlus set with a new object added
    
    **Note:** Added new library to metron-common pom and added 3 new items to dependencies_with_url.csv.
    
    **Testing**
    
    Spun up the Stellar REPL in quick-dev. And verified that the function composition is working as expected and returning correct cardinality estimates for simple sparse set cases. For example:
    ```
    [Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE( HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "runnings"), "cool"), HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "bobsled"), "team")))
    4
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmiklavc/incubator-metron hyperloglog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #397
    
----
commit afce30539f6996a607e85d3fd35aac5fcb5c19aa
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2016-12-15T20:55:39Z

    METRON-627: Add HyperLogLogPlus implementation to Stellar

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92847439
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    --- End diff --
    
    I like that idea!  We should probably make `BLOOM_ADD` function that way too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by jjmeyer0 <gi...@git.apache.org>.
Github user jjmeyer0 commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @mmiklavc have you rebased lately? If not, I'm sure the PR I submitted for Stellar comparisons is the culprit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92847494
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    +  * Description: Add value to the set
    +  * Input:
    +    * hyperLogLogPlus - the hllp set
    +    * o - Object to add to the set
    --- End diff --
    
    Sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @jjmeyer0 Makes sense and that explains things.  @mmiklavc despite it being out of the scope of this ticket, it should just be a 1-liner and we should probably correct it as part of this IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @mmiklavc Meant to ask previously, does this look right?  Am I using your API correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    I also often find myself wondering if we need peer jiras for documentation/wiki work on this stuff created


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92833706
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    --- End diff --
    
    What are the bounds for p and sp?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    In that case, we should have multiple concise README's with more detail and a top level overview readme


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Kick Travis


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by dlyle65535 <gi...@git.apache.org>.
Github user dlyle65535 commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    I was able to duplicate the failures in my environment by doing the following (based on what Travis does).
    
    ```
    mkdir mike_test
    cd mike_test
    git clone --depth=50 https://github.com/apache/incubator-metron.git apache/incubator-metron
    cd apache/incubator-metron/
    git fetch origin +refs/pull/397/merge
    git checkout -qf FETCH_HEAD
    mvn -q integration-test install && build_utils/verify_licenses.sh
    ```
    
    It bombs out with: 
    ```
    cardinality_gives_distinct_value_estimate_with_precisions_set(org.apache.metron.statistics.approximation.HyperLogLogPlusFunctionsIntegrationTest)  Time elapsed: 0.004 sec  <<< ERROR!
    org.apache.metron.common.dsl.ParseException: Unable to parse HLLP_CARDINALITY(
      HLLP_ADD(
        HLLP_ADD(
          HLLP_INIT(5, 6),
          val1
        ),
        val2
      )
    )
    : Syntax error @ 8:3 no viable alternative at input 'HLLP_CARDINALITY(HLLP_ADD(HLLP_ADD(HLLP_INIT(5,6),val1),val2)\n'
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92870511
  
    --- Diff: dependencies_with_url.csv ---
    @@ -21,6 +21,8 @@ com.jcraft:jsch:jar:0.1.42:compile,BSD,http://www.jcraft.com/jsch/
     com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile,CDDL,http://jaxb.java.net/
     com.sun.xml.bind:jaxb-impl:jar:2.2.5-2:compile,CDDL,http://jaxb.java.net/
     com.twitter:jsr166e:jar:1.1.0:compile,CC0 1.0 Universal,http://github.com/twitter/jsr166e
    +it.unimi.dsi:fastutil:jar:6.5.11:compile,ASLv2,https://github.com/vigna/fastutil
    +it.unimi.dsi:fastutil:jar:7.0.6:compile,ASLv2,https://github.com/vigna/fastutil
    --- End diff --
    
    This is part of the transitive test dependencies for Solr. I excluded fastutil and ran the integration tests for metron-solr and nothing broke. Worst case scenario, we can later add 7.0.6 as a test dep if needed, or simply keep the original. But seeing as it works without it, I'm inclined to keep the exclusion unless anyone has any objections. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @jjmeyer0 @mmiklavc It appears that a regression was added relatively recently that affects whitespace at the end of stellar rules (the whitespace isn't sent to the skip channel in the lexer in that situation, for some reason..not sure why).  A stopgap is what JJ said, a slightly more permanant fix considering whitespace has no meaning in Stellar is to trim the rule in BaseStellarProcessor (line 111).  I'd still consider it important to investigate what is confusing the grammar suddenly, mind you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92868034
  
    --- Diff: dependencies_with_url.csv ---
    @@ -21,6 +21,8 @@ com.jcraft:jsch:jar:0.1.42:compile,BSD,http://www.jcraft.com/jsch/
     com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile,CDDL,http://jaxb.java.net/
     com.sun.xml.bind:jaxb-impl:jar:2.2.5-2:compile,CDDL,http://jaxb.java.net/
     com.twitter:jsr166e:jar:1.1.0:compile,CC0 1.0 Universal,http://github.com/twitter/jsr166e
    +it.unimi.dsi:fastutil:jar:6.5.11:compile,ASLv2,https://github.com/vigna/fastutil
    +it.unimi.dsi:fastutil:jar:7.0.6:compile,ASLv2,https://github.com/vigna/fastutil
    --- End diff --
    
    Update - this 6.5.11 dep was added somewhere else along the way: org.apache.solr:solr-test-framework:jar:5.2.1:test



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Maybe we could build a readme-wiki, like a section of our wiki which is just the readme's from the current build?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @ottobackwards Good point, there is some overlap between end-user docs and the stuff in the README.md's.  Things like the stellar language and function docs should be part of the wiki, I suspect.  We probably need something more high level than the examples in the stellar-statistics README that I cited earlier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92834036
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    +  * Description: Add value to the set
    +  * Input:
    +    * hyperLogLogPlus - the hllp set
    +    * o - Object to add to the set
    --- End diff --
    
    Can we make `o` be an object *or* a list?  This way you can call `HLLP_OFFER(hllp, [ field1, field2])` or `HLLP_OFFER(hllp, field1)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    what we need is a plan, whatever it is ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Should functions like this have a more expansive description, that perhaps includes a set of sample data and the outcome?  Similar to what the Profiler functions have?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Ok, this is making more sense now. GitHub has merged this PR out of band of the build. That fetch is not doing a merge. It's fetching what has already been merged out of band from the build - `git fetch origin +refs/pull/397/merge`. This also explains how GitHub is able to tell us if a PR has merge conflicts long before Travis runs its build. So yes, the grammar corrections should work. On it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92846025
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    +  * Description: Add value to the set
    +  * Input:
    +    * hyperLogLogPlus - the hllp set
    +    * o - Object to add to the set
    --- End diff --
    
    Heh, I just made the same comment above. Incidentally, our BLOOM_ADD currently works like a vararg. I may open a separate PR to change that as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @cestella Verified the above - modifying the testing plan now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92858410
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java ---
    @@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
         }
       }
     
    +  @Stellar( namespace="HLLP"
    +          , name="CARDINALITY"
    +          , description="Returns HyperLogLogPlus-estimated cardinality for this set"
    +          , params = { "hyperLogLogPlus - the hllp set" }
    +          , returns = "Long value representing the cardinality for this set"
    +  )
    +  public static class HLLPCardinality extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() < 1) {
    +        throw new IllegalArgumentException("Must pass an hllp set to get the cardinality for");
    +      }
    +      return ((HyperLogLogPlus) args.get(0)).cardinality();
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="INIT"
    +          , description="Initializes the set"
    +          , params = {
    +                      "p (required) - the precision value for the normal set"
    --- End diff --
    
    I think a link and any description or guidance you can provide in the description will help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92848090
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/utils/HyperLogLogPlus.java ---
    @@ -0,0 +1,102 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.metron.common.utils;
    +
    +import com.clearspring.analytics.stream.cardinality.CardinalityMergeException;
    +import com.clearspring.analytics.stream.cardinality.ICardinality;
    +import com.google.common.collect.Lists;
    +
    +import java.io.Serializable;
    +import java.util.List;
    +
    --- End diff --
    
    Yep


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    But, I just download the binaries or install through ambari?  You mean I have to look through the codebase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92835407
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java ---
    @@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
         }
       }
     
    +  @Stellar( namespace="HLLP"
    +          , name="CARDINALITY"
    +          , description="Returns HyperLogLogPlus-estimated cardinality for this set"
    +          , params = { "hyperLogLogPlus - the hllp set" }
    +          , returns = "Long value representing the cardinality for this set"
    +  )
    +  public static class HLLPCardinality extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() < 1) {
    +        throw new IllegalArgumentException("Must pass an hllp set to get the cardinality for");
    +      }
    +      return ((HyperLogLogPlus) args.get(0)).cardinality();
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="INIT"
    +          , description="Initializes the set"
    +          , params = {
    +                      "p (required) - the precision value for the normal set"
    +                     ,"sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled."
    +                     }
    +          , returns = "A new HyperLogLogPlus set"
    +  )
    +  public static class HLLPInit extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() == 0) {
    +        throw new IllegalArgumentException("Normal set precision is required");
    +      } else if (args.size() == 1) {
    +        int p = ConversionUtils.convert(args.get(0), Integer.class);
    +        return new HyperLogLogPlus(p);
    +      } else {
    +        int p = ConversionUtils.convert(args.get(0), Integer.class);
    +        int sp = ConversionUtils.convert(args.get(1), Integer.class);
    +        return new HyperLogLogPlus(p, sp);
    +      }
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="MERGE"
    +          , description="Merge hllp sets together"
    +          , params = {
    +                      "hllp1 - first hllp set"
    --- End diff --
    
    Can we make this fit the pattern from `BLOOM_MERGE` to just take a list?  That was done because interactions with the profiler, which is largely where this will be used, will be made easier with that arrangement (i.e. `PROFILE_GET` returns a list of these, so you'd call this 99% of the time via `HLLP_MERGE(PROFILE_GET(....))`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @ottobackwards +1 on some working examples. We currently plop the whole lot of Stellar documentation in the metron-common README and I wonder if we should start splitting any of this out into separate md files. This set of functions really works as a small suite. Same for the Bloom filter functions. And some deeper-dive might prove useful. I like @cestella 's recommendation of using the profiler for an end-to-end test. This really gets at illustrating the value from a user/customer perspective. 
    
    I think we want to consider how we choose to use the Metron wiki vs the README files. It feels like we could keep the README's lean and mean, dealing specifically with relevant options and links to external papers and documentation. And then on the wiki we can go into greater technical detail and maybe even start assembling a cookbook of examples. Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r93529216
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/utils/HyperLogLogPlus.java ---
    @@ -0,0 +1,102 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.metron.common.utils;
    +
    +import com.clearspring.analytics.stream.cardinality.CardinalityMergeException;
    +import com.clearspring.analytics.stream.cardinality.ICardinality;
    +import com.google.common.collect.Lists;
    +
    +import java.io.Serializable;
    +import java.util.List;
    +
    --- End diff --
    
    Heads up, I searched arxiv and  it doesn't appear that this paper is there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @nickwallen That does look right, but you probably want the result to just be `out` so you can decouple the in-degree and out-degree calculation from the window size in the profiler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by Ryan Merriman <me...@gmail.com>.
I agree that stellar should be separate from common.

On Fri, Dec 16, 2016 at 1:55 PM, cestella <gi...@git.apache.org> wrote:

> Github user cestella commented on the issue:
>
>     https://github.com/apache/incubator-metron/pull/397
>
>     A couple things.  First off, these functions (Bloom and HLL) really
> should exist in `stellar-statistics`. rather than being in common IMO
> (thoughts?).  Second, *stellar* should exist in its own project with a bare
> set of common functions and the stellar infrastructure.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    A couple things.  First off, these functions (Bloom and HLL) really should exist in `stellar-statistics`. rather than being in common IMO (thoughts?).  Second, *stellar* should exist in its own project with a bare set of common functions and the stellar infrastructure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92847280
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java ---
    @@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
         }
       }
     
    +  @Stellar( namespace="HLLP"
    +          , name="CARDINALITY"
    +          , description="Returns HyperLogLogPlus-estimated cardinality for this set"
    +          , params = { "hyperLogLogPlus - the hllp set" }
    +          , returns = "Long value representing the cardinality for this set"
    +  )
    +  public static class HLLPCardinality extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() < 1) {
    +        throw new IllegalArgumentException("Must pass an hllp set to get the cardinality for");
    +      }
    +      return ((HyperLogLogPlus) args.get(0)).cardinality();
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="INIT"
    +          , description="Initializes the set"
    +          , params = {
    +                      "p (required) - the precision value for the normal set"
    +                     ,"sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled."
    +                     }
    +          , returns = "A new HyperLogLogPlus set"
    +  )
    +  public static class HLLPInit extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() == 0) {
    +        throw new IllegalArgumentException("Normal set precision is required");
    +      } else if (args.size() == 1) {
    +        int p = ConversionUtils.convert(args.get(0), Integer.class);
    +        return new HyperLogLogPlus(p);
    +      } else {
    +        int p = ConversionUtils.convert(args.get(0), Integer.class);
    +        int sp = ConversionUtils.convert(args.get(1), Integer.class);
    +        return new HyperLogLogPlus(p, sp);
    +      }
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="MERGE"
    +          , description="Merge hllp sets together"
    +          , params = {
    +                      "hllp1 - first hllp set"
    --- End diff --
    
    Agreed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Did you verify that these functions run on:
    * the profiler
    * the parser topology via field transformations
    * the enrichment topology via stellar enrichments
    * the Stellar REPL
    
    If so, then I'm +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @nickwallen Thanks for the feedback! - I'm currently working on finishing up some performance metrics (p/sp vals vs accuracy, cardinality, serialized mem consumption) in addition to an example utilizing the profiler. Your example above is exactly in line with what I was thinking might make for a good demo of this functionality.
    
    Also, agreed on the single-line example - that was meant to make cut-and-paste easy. The unit and integration tests are much easier to read. I'll provide similar examples here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92845799
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    --- End diff --
    
    In addition to this terminology change, what do you think of changing the implementation to accept a single item or array?
    `HLLP_ADD(hllp, "value1")`
    or
    `HLLP_ADD(hllp, [ "value1", "value2", "value3" ] )`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    I was thinking how I could use your HLLP functionality with the Profiler.  I think I could use this functionality to track the in-degree and out-degree of a host over time.  This might be an interesting motivating example.
    
    For example, calculating the in-degree would look like the following.
    ```
    {
      "profile": "in-degree",
      "onlyif": "source.type == 'yaf'"
      "foreach": "ip_dst_addr",
      "init": {
        "in": "HLLP_INIT(5, 6)"
      }
      "update": {
        "in": "HLLP_ADD(in, ip_src_addr)"
      }
      "result": {
        "HLLP_CARDINALITY(in)"
      }
    }
    ```
    
    Calculating the out-degree would look like this.
    ```
    {
      "profile": "out-degree",
      "onlyif": "source.type == 'yaf'"
      "foreach": "ip_src_addr",
      "init": {
        "out": "HLLP_INIT(5, 6)"
      }
      "update": {
        "out": "HLLP_ADD(out, ip_dst_addr)"
      }
      "result": {
        "HLLP_CARDINALITY(out)"
      }
    }
    ```
    
    
    Of course, with your `HLLP_MERGE` it might be more interesting to store the HLLP object itself and call `HLLP_CARDINALITY` after-the-fact, but I just want to make sure I'm using the API correctly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by jjmeyer0 <gi...@git.apache.org>.
Github user jjmeyer0 commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Okay, I took a look. It's because EOL isn't defined as a fragment.
    
    If `EOL : '\n';` is changed to `fragment EOL : '\n';` it should work. Without the `fragment` keyword antlr4 makes a token for `EOL` and it gets confused on how to parse it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    I imagine the wiki is for things like process docs (bylaws, etc.) and high level coverage of end-to-end usecases aimed at multiple features working in concert to achieve an end (e.g. the blogs).  These get updated per release.  The README examples are more targeted and aimed at demonstrating the feature without necessarily being tied to a specific usecase (e.g. the MAD example just generates random data, not security data).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Because this is a function that will 99% be used from the profiler, could you please design a simple test around using this with the profiler and give us quick rundown on the test procedure as a comment on this PR.  I'm thinking something similar to [here](https://github.com/apache/incubator-metron/tree/master/metron-analytics/metron-statistics#median-absolute-deviation).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92834479
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    --- End diff --
    
    I'm not in love with `HLLP_OFFER`..I'd prefer something like `HLLP_ADD` to match `BLOOM_ADD`.  I understand that HLLP is probabalistic, so they're probably being cagey about having their API calls make checks that their maths can't check, but I think we can educate sufficiently to make it safe to call this `ADD` IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92836313
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/utils/HyperLogLogPlus.java ---
    @@ -0,0 +1,102 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.metron.common.utils;
    +
    +import com.clearspring.analytics.stream.cardinality.CardinalityMergeException;
    +import com.clearspring.analytics.stream.cardinality.ICardinality;
    +import com.google.common.collect.Lists;
    +
    +import java.io.Serializable;
    +import java.util.List;
    +
    --- End diff --
    
    Can you link the google paper from arxiv on which this is based in the Javadoc comments and in the Stellar description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @dlyle65535 the thing I'm not fully understanding is how this is working in my local branch. Doing a git fetch and checkout shouldn't merge anything from master, so the most recent master shouldn't be in this test run. Doing a git pull does a fetch and merge, which I would anticipate causing problems.
    
    @jjmeyer0 Yeah, that's what I thought as well. Again, unclear yet why this is not a problem locally. Furthermore, it wasn't a problem in the last commit I had that I believe had the newline. I'll give it a shot anyhow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/397


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @nickwallen Fair point; I missed that last sentence.  Sorry for the confusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92866062
  
    --- Diff: metron-platform/metron-common/README.md ---
    @@ -229,21 +249,33 @@ Using parens such as: "foo" : "\<ok\>" requires escaping; "foo": "\'\<ok\>\'"
         * input - List
       * Returns: Last element of the list
     
    -### `FILL_LEFT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the left
    +### `HLLP_CARDINALITY`
    +  * Description: Returns HyperLogLogPlus-estimated cardinality for this set
       * Input:
    -    * input - string
    -    * fill - the fill character
    -    * len - the required length
    -  * Returns: the filled string
    +    * hyperLogLogPlus - the hllp set
    +  * Returns: Long value representing the cardinality for this set
     
    -### `FILL_RIGHT`
    -  * Description: Fills or pads a given string with a given character, to a given length on the right
    +### `HLLP_INIT`
    +  * Description: Initializes the set
       * Input:
    -    * input - string
    -    * fill - the fill character string
    -    * len - the required length
    -  * Returns: Last element of the list
    +    * p (required) - the precision value for the normal set
    +    * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
    +  * Returns: A new HyperLogLogPlus set
    +
    +### `HLLP_MERGE`
    +  * Description: Merge hllp sets together
    +  * Input:
    +    * hllp1 - first hllp set
    +    * hllp2 - second hllp set
    +    * hllpn - additional sets to merge
    +  * Returns: A new merged HyperLogLogPlus estimator set
    +
    +### `HLLP_OFFER`
    --- End diff --
    
    That is a great idea



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by jjmeyer0 <gi...@git.apache.org>.
Github user jjmeyer0 commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    This looks like an issue with the grammar. My guess is it isn't properly handling white space at the end. If you change the variable `hllpDefaultConstructorRule` to the below, it should work. I know this isn't a permanent solution, but it narrows the problem down.
    ```
      /**
       *HLLP_CARDINALITY(
       *  HLLP_ADD(
       *    HLLP_ADD(
       *      HLLP_INIT(),
       *      val1
       *    ),
       *    val2
       *  )
       *  )*/
      @Multiline
      private static String hllpDefaultConstructorRule;
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
GitHub user mmiklavc reopened a pull request:

    https://github.com/apache/incubator-metron/pull/397

    METRON-627: Add HyperLogLogPlus implementation to Stellar

    This PR addresses https://issues.apache.org/jira/browse/METRON-627
    
    Leverages the HLLP implementation from https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
    
    4 new Stellar functions have been added that allow a user to initialize a cardinality estimator, add items, merge estimators, and calculate cardinality estimates.
    
    ### `HLLP_CARDINALITY`
      * Description: Returns HyperLogLogPlus-estimated cardinality for this set
      * Input:
        * hyperLogLogPlus - the hllp set
      * Returns: Long value representing the cardinality for this set
    
    ### `HLLP_INIT`
      * Description: Initializes the set
      * Input:
        * p (required) - the precision value for the normal set
        * sp - the precision value for the sparse set. If sp is not specified the sparse set will be disabled.
      * Returns: A new HyperLogLogPlus set
    
    ### `HLLP_MERGE`
      * Description: Merge hllp sets together
      * Input:
        * hllp1 - first hllp set
        * hllp2 - second hllp set
        * hllpn - additional sets to merge
      * Returns: A new merged HyperLogLogPlus estimator set
    
    ### `HLLP_OFFER`
      * Description: Add value to the set
      * Input:
        * hyperLogLogPlus - the hllp set
        * o - Object to add to the set
      * Returns: The HyperLogLogPlus set with a new object added
    
    **Note:** Added new library to metron-common pom and added 3 new items to dependencies_with_url.csv.
    
    **Testing**
    
    Spun up the Stellar REPL in quick-dev. And verified that the function composition is working as expected and returning correct cardinality estimates for simple sparse set cases. For example:
    ```
    [Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE( HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "runnings"), "cool"), HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "bobsled"), "team")))
    4
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmiklavc/incubator-metron hyperloglog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #397
    
----
commit afce30539f6996a607e85d3fd35aac5fcb5c19aa
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2016-12-15T20:55:39Z

    METRON-627: Add HyperLogLogPlus implementation to Stellar

commit 414a3a98976b98a253ab9921720f02c8a7431da2
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-09T17:00:08Z

    work in progress commit

commit c7f57a4acbb0ef357c1af9eaa263afea7bc83d9a
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-11T16:58:58Z

    Merge with master

commit 90d9659f415404c6c4682289c7bde669c352f517
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-12T20:33:10Z

    Refactor, fix statistics output

commit 261e69651d4ae0b99e88e0e4a2c4e7568aa23fcb
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-12T23:17:13Z

    METRON-627: Updated with sensible default precision values

commit 9078094dd720d89f64ecf45506ab0c5077aa58a7
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-13T19:17:37Z

    METRON-627: Add default init for HLLP_ADD(null, 'val')

commit 9e1ff937fe51841ac2fa3235bf87964cba8a1ae8
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-17T20:09:26Z

    Merge branch 'master' into hyperloglog

commit d392f044e330fe273cb0f0b4ff820b4ef1a3595d
Author: Michael Miklavcic <mi...@gmail.com>
Date:   2017-01-17T20:33:11Z

    METRON-627: Fix Stellar lexer to handle newline at end

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92836067
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/utils/ConversionUtils.java ---
    @@ -18,23 +18,41 @@
     
     package org.apache.metron.common.utils;
     
    +import com.google.common.collect.Lists;
     import org.apache.commons.beanutils.BeanUtilsBean2;
     import org.apache.commons.beanutils.ConvertUtilsBean;
     
    +import java.util.List;
    +
     public class ConversionUtils {
    -  private static ThreadLocal<ConvertUtilsBean> UTILS_BEAN  = new ThreadLocal<ConvertUtilsBean>() {
    +  private static ThreadLocal<ConvertUtilsBean> UTILS_BEAN = new ThreadLocal<ConvertUtilsBean>() {
         @Override
         protected ConvertUtilsBean initialValue() {
           ConvertUtilsBean ret = BeanUtilsBean2.getInstance().getConvertUtils();
           ret.deregister();
    -      ret.register(false,true, 1);
    +      ret.register(false, true, 1);
           return ret;
         }
       };
    +
       public static <T> T convert(Object o, Class<T> clazz) {
    -    if(o == null) {
    +    if (o == null) {
           return null;
         }
         return clazz.cast(UTILS_BEAN.get().convert(o, clazz));
       }
    +
    +  /**
    +   * Performs naive List type conversion.
    +   *
    +   * @param from Source list
    +   * @param clazz Class type to cast the List elements to
    +   * @param <T> Source element type
    +   * @param <U> Desired element type
    +   * @return New List with the elements cast to the desired type
    +   */
    +  public static <T, U> List<U> convertList(List<T> from, Class<U> clazz) {
    +    return Lists.transform(from, s -> clazz.cast(s));
    --- End diff --
    
    My issue with this is that above we are calling `convert` and is a coercive operation (e.g. convert("1", Integer.class) == 1) rather than just a simple cast.  Can you make `convertList` call `convert` on the elements?  If you really want a naive version of this, feel free to make a `castList` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Very cool stuff, @mmiklavc. Can't wait to see it hit master.
    
    Just to lend my $.02, I do agree with @ottobackwards that a more detailed example would be helpful.  Maybe something in a REPL session that uses multiple lines.  Kind of hard for me to grok the single-line example in the PR description.
    
    Also, per @cestella, I agree that this makes sense in `metron-analytics/metron-statistics`.  He mentioned `stellar-statistics`, but I think he meant `metron-statistics`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @mmiklavc Can you please merge master in and commit this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    @nickwallen @cestella Moved the HLLP implementation wrapper and Stellar functions to the metron-statistics project. Bloomfilter Stellar functions should probably be moved to the stats project as well, but that's for a separate PR, I feel. 
    
    I also added a utility for generating performance metrics that includes a README (HLLP.md) that is linked to in the main metron-statistics README as well as the metron-common README. There is a link provided to the Google paper, an exposition of performance results, and a new default precision for the sparse and dense (normal) sets.
    
    Demo sample soon to follow using the profiler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Merged with latest master and modified Stellar.g4 to handle EOL fragments. Test went from failing to passing locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Now, regarding the wiki vs the Readme.  The reason I am hesitant to move things to the wiki is that the wiki doesn't really track with the code well.  What happens when we change things around and want examples from an older version?  We don't have a solution for that at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Hm, something fishy going on here with Travis. Running all tests fine from Maven (on OSX) locally.
    ```
    [INFO] metron-analytics ................................... SUCCESS [  1.079 s]
    [INFO] metron-maas-common ................................. SUCCESS [ 10.757 s]
    [INFO] metron-maas-service ................................ SUCCESS [01:25 min]
    [INFO] metron-statistics .................................. SUCCESS [ 46.321 s]
    [INFO] metron-profiler-common ............................. SUCCESS [ 13.036 s]
    [INFO] metron-profiler-client ............................. SUCCESS [01:21 min]
    [INFO] metron-profiler .................................... SUCCESS [04:34 min]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Yes, understood.  I think that's what I wrote at the bottom of my last post.  More clear for a motivating example to actually do the calculation as the `result`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    Ok, I propose a [discuss] around docs.  These are great questions, but they are probably not going to result in getting us HLL functionality soon ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92846304
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java ---
    @@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
         }
       }
     
    +  @Stellar( namespace="HLLP"
    +          , name="CARDINALITY"
    +          , description="Returns HyperLogLogPlus-estimated cardinality for this set"
    +          , params = { "hyperLogLogPlus - the hllp set" }
    +          , returns = "Long value representing the cardinality for this set"
    +  )
    +  public static class HLLPCardinality extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() < 1) {
    +        throw new IllegalArgumentException("Must pass an hllp set to get the cardinality for");
    +      }
    +      return ((HyperLogLogPlus) args.get(0)).cardinality();
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="INIT"
    +          , description="Initializes the set"
    +          , params = {
    +                      "p (required) - the precision value for the normal set"
    --- End diff --
    
    There are some diagrams. Think it's sufficient to link in the description, or should we consider embedding these images right into the README?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc closed the pull request at:

    https://github.com/apache/incubator-metron/pull/397


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #397: METRON-627: Add HyperLogLogPlus implementation ...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/incubator-metron/pull/397
  
    I agree with keeping the README's lean.  What I'm struggling with ( and this may just be as an FOSS noob ) is that there seems to be no process around calling out documentation.   On internal company projects, we would require jira and confluence entries for product documentation and internal documentation needs - to put in the backlog.  I don't see where we do that here.  There is a general understanding that "even if you don't contribute code, contributing documentation is good", but where are we tracking what needs it etc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92833549
  
    --- Diff: dependencies_with_url.csv ---
    @@ -21,6 +21,8 @@ com.jcraft:jsch:jar:0.1.42:compile,BSD,http://www.jcraft.com/jsch/
     com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile,CDDL,http://jaxb.java.net/
     com.sun.xml.bind:jaxb-impl:jar:2.2.5-2:compile,CDDL,http://jaxb.java.net/
     com.twitter:jsr166e:jar:1.1.0:compile,CC0 1.0 Universal,http://github.com/twitter/jsr166e
    +it.unimi.dsi:fastutil:jar:6.5.11:compile,ASLv2,https://github.com/vigna/fastutil
    +it.unimi.dsi:fastutil:jar:7.0.6:compile,ASLv2,https://github.com/vigna/fastutil
    --- End diff --
    
    Any idea if we can rely on just 7.0.6?  rather than 6.5.11 and 7.0.6?  Is streamlib depending on both?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #397: METRON-627: Add HyperLogLogPlus implemen...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/397#discussion_r92835008
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/DataStructureFunctions.java ---
    @@ -137,6 +138,105 @@ public Object apply(List<Object> args) {
         }
       }
     
    +  @Stellar( namespace="HLLP"
    +          , name="CARDINALITY"
    +          , description="Returns HyperLogLogPlus-estimated cardinality for this set"
    +          , params = { "hyperLogLogPlus - the hllp set" }
    +          , returns = "Long value representing the cardinality for this set"
    +  )
    +  public static class HLLPCardinality extends BaseStellarFunction {
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      if (args.size() < 1) {
    +        throw new IllegalArgumentException("Must pass an hllp set to get the cardinality for");
    +      }
    +      return ((HyperLogLogPlus) args.get(0)).cardinality();
    +    }
    +  }
    +
    +  @Stellar( namespace="HLLP"
    +          , name="INIT"
    +          , description="Initializes the set"
    +          , params = {
    +                      "p (required) - the precision value for the normal set"
    --- End diff --
    
     Are their any documents that we can link to describe the tradeoffs for these values?  I'm thinking of something like [here](https://en.wikipedia.org/wiki/File:Bloom_filter_fp_probability.svg) discussing the tradeoff between accuracy and size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---