You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by cestella <gi...@git.apache.org> on 2016/12/05 14:57:43 UTC

[GitHub] incubator-metron pull request #388: METRON-610: OnlineStatisticsProvider ser...

GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/388

    METRON-610: OnlineStatisticsProvider serialization is broken at random in the REPL

    We rely on the t-digest library version 3.1 and elasticsearch brings along 3.0. There is a small API incompatibility between the two versions (namely the static method `TDigest.createAvlTreeDigest()` is available in 3.1 but not 3.0).
    
    If the classpath for the Stellar REPL chooses the 3.0 version of the library, then deserialization is broken.  Strictly speaking this is not a problem of the serialized form being incorrect (we use t-digest's custom serialization), but a problem in the custom kryo serialization code in the class. It relies on the default constructor being called and then the digest being deserialized using the code within the t-digest library. Because the default constructor initializes the digest via a call that does not exist in 3.0, it breaks. The serialization logic is safe to use in both versions, but the object can't be constructed in 3.0.  This fix directly instantiates the AvlTreeDigest, which exists in both versions.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron stats_serialization_bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/388.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #388
    
----
commit 8c8175e607fe769c7f84ca390e772b6a56dfaa7e
Author: cstella <ce...@gmail.com>
Date:   2016-12-05T14:46:47Z

    t-digest library between 3.0 and 3.1 have different APIs but both treat the AVL Tree library the same, so adjusting the call should allow us to support either version.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #388: METRON-610: OnlineStatisticsProvider serializat...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/388
  
    It's a good point, @nickwallen.  We could shade and relocate the jar to enforce our version is used rather than a random one being picked up on the classpath.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #388: METRON-610: OnlineStatisticsProvider ser...

Posted by dlyle65535 <gi...@git.apache.org>.
Github user dlyle65535 commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/388#discussion_r91100001
  
    --- Diff: metron-analytics/metron-statistics/pom.xml ---
    @@ -52,4 +52,62 @@
                 <version>${global_hbase_guava_version}</version>
             </dependency>
         </dependencies>
    +    <build>
    +        <plugins>
    +            <plugin>
    +                <groupId>org.apache.maven.plugins</groupId>
    +                <artifactId>maven-shade-plugin</artifactId>
    +                <version>${global_shade_version}</version>
    +                <configuration>
    +                    <createDependencyReducedPom>true</createDependencyReducedPom>
    +                </configuration>
    +                <executions>
    +                    <execution>
    +                        <phase>package</phase>
    +                        <goals>
    +                            <goal>shade</goal>
    +                        </goals>
    +                        <configuration>
    +                            <relocations>
    +                                <relocation>
    +                                    <pattern>com.tdunning</pattern>
    +                                    <shadedPattern>org.apache.metron.tdunning</shadedPattern>
    +                                </relocation>
    +                            </relocations>
    +                            <artifactSet>
    +                                <excludes>
    +                                    <exclude>storm:storm-core:*</exclude>
    +                                    <exclude>storm:storm-lib:*</exclude>
    +                                    <exclude>org.slf4j.impl*</exclude>
    +                                    <exclude>org.slf4j:slf4j-log4j*</exclude>
    +                                </excludes>
    +                            </artifactSet>
    +                            <transformers>
    +                                <transformer
    +                                  implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
    +                                     <resources>
    +                                        <resource>.yaml</resource>
    +                                        <resource>LICENSE.txt</resource>
    +                                        <resource>ASL2.0</resource>
    +                                        <resource>NOTICE.txt</resource>
    +                                      </resources>
    +                                </transformer>
    +                                <!-- UNCOMMENT THIS IF YOU NEED TO REGENERATE THE BEST GUESS NOTICES FILE WHICH REQUIRES PRUNING EVERY RELEASE -->
    +                                <!--transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
    +                                    <addHeader>false</addHeader>
    +                                    <projectName>${project.name}</projectName>
    +                                </transformer-->
    --- End diff --
    
    Would this be better as a wiki entry for release managers?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #388: METRON-610: OnlineStatisticsProvider serializat...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/388
  
    Is there not a way to ensure that our code uses a single version of the t-digest library?  I'd be worried that we might run into another incompatibility in the future.  
    
    If not, is there some ingenious way to automate the testing of this scenario?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #388: METRON-610: OnlineStatisticsProvider serializat...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/388
  
    Ok, I tested this in vagrant.  The testing methodology was as follows:
    * Created a simple global profile that adds a `value` field to a stats object and serializes it:
    ```
    {
      "profiles": [
        {
          "profile": "stats",
          "foreach": "'global'",
          "onlyif": "true",
          "init" : {
            "s": "STATS_INIT()"
                   },
          "update": {
            "s": "STATS_ADD(s, value)"
                    },
          "result": "s"
        }
      ]
    }
    ```
    * Created a simple CSV parser called `mad` that takes in comma separated pairs and adds the mean from the last 10 minute window:
    ```
    {
      "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
     ,"sensorTopic" : "mad"
     ,"parserConfig" : {
        "columns" : {
           "value_str" : 0
          ,"source" : 1
                    }
                       }
     ,"fieldTransformations" : [
        {
        "transformation" : "STELLAR"
       ,"output" : [ "value" , "mean"]
       ,"config" : {
          "value" : "TO_DOUBLE(value_str)",
          "mean" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stats', 'global', 10, 'MINUTES')))"
                   }
        }
                               ]
    }
    ```
    * Created an enrichment that adds a field `enrichment_mean` via a stellar enrichment:
    ```
    {
      "index": "mad",
      "batchSize": 1,
      "enrichment": {
        "fieldMap": {
          "stellar" : {
            "config" : {
              "enrichment_mean" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stats', 'global', 10, 'MINUTES')))"
            }
          }
        }
      ,"fieldToTypeMap": { }
      },
      "threatIntel": {
        "fieldMap": { },
        "fieldToTypeMap": { },
        "triageConfig" : {
          "riskLevelRules" : {
          },
          "aggregator" : "MAX"
        }
      }
    }
    ```
    * Generated a stream of data of the form `value,type` (e.g. 2.45,synthetic)
    * Open up the stellar shell after 10 minutes and run `STATS_MEAN(STATS_MERGE(PROFILE_GET('stats', 'global', 10, 'MINUTES')))` and ensure I get something that is not `NaN`
    
    This ensures that statistics objects are able to be serialized and deserialized correctly from the:
    * Enrichment topology
    * Parser topology
    * Stellar REPL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #388: METRON-610: OnlineStatisticsProvider ser...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/388


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #388: METRON-610: OnlineStatisticsProvider serializat...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/388
  
    I wrote out serialized objects from 3.0 and 3.1 versions of the t-digest library and verified that the objects were able to mutually read from both versions of the library.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #388: METRON-610: OnlineStatisticsProvider ser...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/388#discussion_r91103873
  
    --- Diff: metron-analytics/metron-statistics/pom.xml ---
    @@ -52,4 +52,62 @@
                 <version>${global_hbase_guava_version}</version>
             </dependency>
         </dependencies>
    +    <build>
    +        <plugins>
    +            <plugin>
    +                <groupId>org.apache.maven.plugins</groupId>
    +                <artifactId>maven-shade-plugin</artifactId>
    +                <version>${global_shade_version}</version>
    +                <configuration>
    +                    <createDependencyReducedPom>true</createDependencyReducedPom>
    +                </configuration>
    +                <executions>
    +                    <execution>
    +                        <phase>package</phase>
    +                        <goals>
    +                            <goal>shade</goal>
    +                        </goals>
    +                        <configuration>
    +                            <relocations>
    +                                <relocation>
    +                                    <pattern>com.tdunning</pattern>
    +                                    <shadedPattern>org.apache.metron.tdunning</shadedPattern>
    +                                </relocation>
    +                            </relocations>
    +                            <artifactSet>
    +                                <excludes>
    +                                    <exclude>storm:storm-core:*</exclude>
    +                                    <exclude>storm:storm-lib:*</exclude>
    +                                    <exclude>org.slf4j.impl*</exclude>
    +                                    <exclude>org.slf4j:slf4j-log4j*</exclude>
    +                                </excludes>
    +                            </artifactSet>
    +                            <transformers>
    +                                <transformer
    +                                  implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
    +                                     <resources>
    +                                        <resource>.yaml</resource>
    +                                        <resource>LICENSE.txt</resource>
    +                                        <resource>ASL2.0</resource>
    +                                        <resource>NOTICE.txt</resource>
    +                                      </resources>
    +                                </transformer>
    +                                <!-- UNCOMMENT THIS IF YOU NEED TO REGENERATE THE BEST GUESS NOTICES FILE WHICH REQUIRES PRUNING EVERY RELEASE -->
    +                                <!--transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
    +                                    <addHeader>false</addHeader>
    +                                    <projectName>${project.name}</projectName>
    +                                </transformer-->
    --- End diff --
    
    Probably.  It's not needed here in any case since this isn't a jar that gets distributed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #388: METRON-610: OnlineStatisticsProvider serializat...

Posted by nickwallen <gi...@git.apache.org>.
Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/388
  
    +1 Looks solid


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---