You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by mmiklavc <gi...@git.apache.org> on 2017/01/10 14:00:04 UTC

[GitHub] incubator-metron pull request #401: METRON-637: Add a STATS_BIN function to ...

Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/401#discussion_r95368025
  
    --- Diff: metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/StellarStatisticsFunctions.java ---
    @@ -425,4 +428,57 @@ public Object apply(List<Object> args) {
           return result;
         }
       }
    +
    +  /**
    +   * Calculates the statistical bin that a value falls in.
    +   */
    +  @Stellar(namespace = "STATS", name = "BIN"
    +          , description = "Computes the bin that the value is in based on the statistical distribution."
    +          , params = {
    +          "stats - The Stellar statistics object"
    +          , "value - The value to bin"
    +          , "bounds? - A list of percentile bin bounds (excluding min and max) or a string representing a known and common set of bins.  " +
    +          "For convenience, we have provided QUARTILE, QUINTILE, and DECILE which you can pass in as a string arg." +
    +          " If this argument is omitted, then we assume a Quartile bin split."
    +                    }
    +          ,returns = "Which bin N the value falls in such that bound(N-1) < value <= bound(N). " +
    +          "No min and max bounds are provided, so values smaller than the 0'th bound go in the 0'th bin, " +
    +          "and values greater than the last bound go in the M'th bin."
    +  )
    +  public static class StatsBin extends BaseStellarFunction {
    +    public enum BinSplits {
    +      QUARTILE(ImmutableList.of(25.0, 50.0, 75.0)),
    +      QUINTILE(ImmutableList.of(20.0, 40.0, 60.0, 80.0)),
    +      DECILE(ImmutableList.of(10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0))
    +      ;
    +      public final List<Number> split;
    +      BinSplits(List<Number> split) {
    +        this.split = split;
    +      }
    +
    +      public static List<Number> getSplit(Object o) {
    +        if(o instanceof String) {
    +          return BinSplits.valueOf((String)o).split;
    +        }
    +        else if(o instanceof List) {
    +          return ConversionUtils.convert(o, List.class);
    +        }
    +        throw new IllegalStateException("The split you tried to pass is not a valid split: " + o.toString());
    +      }
    +    }
    +
    +
    +    @Override
    +    public Object apply(List<Object> args) {
    +      StatisticsProvider stats = convert(args.get(0), StatisticsProvider.class);
    +      Double value = convert(args.get(1), Double.class);
    +      final List<Number> bins = args.size() > 2?BinSplits.getSplit(args.get(2)):BinSplits.QUARTILE.split;
    +
    +      if (stats == null || value == null || bins.size() == 0) {
    +        return -1;
    +      }
    +      return MathFunctions.Bin.getBin(value, bins.size(), bin -> stats.getPercentile(bins.get(bin).doubleValue()));
    --- End diff --
    
    Nice suggestion by Matt. And I like the math bin code reuse and ability to plug in a stats function provider.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---