You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by cestella <gi...@git.apache.org> on 2016/12/22 16:44:47 UTC

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/403

    METRON-640: Add a Stellar function to compute shannon entropy for strings

    A common feature used for models (especially DGA models) is shannon entropy of strings. We should have the ability to compute it in stellar.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron METRON-640

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #403
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #403: METRON-640: Add a Stellar function to compute s...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/incubator-metron/pull/403
  
    Short and sweet, nice contribution. Just a couple questions and it looks ready to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95380770
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 1e-6);
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "")), 1e-6);
    +
    +    /*
    +    Now consider the string aaaaaaaaaabbbbbccccc or 10 a's followed by 5 b's and 5 c's.
    +    The probabilities of each character is as follows:
    +    p(a) = 1/2
    +    p(b) = 1/4
    +    p(c) = 1/4
    +    so the shannon entropy should be
    +      -p(a)*log_2(p(a)) - p(b)*log_2(p(b)) - p(c)*log_2(p(c)) =
    +      -0.5*-1 - 0.25*-2 - 0.25*-2 = 1.5
    +     */
    +    Assert.assertEquals(1.5, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "aaaaaaaaaabbbbbccccc")), 1e-1);
    --- End diff --
    
    How'd you come up with the delta value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95382684
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 1e-6);
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "")), 1e-6);
    +
    +    /*
    +    Now consider the string aaaaaaaaaabbbbbccccc or 10 a's followed by 5 b's and 5 c's.
    +    The probabilities of each character is as follows:
    +    p(a) = 1/2
    +    p(b) = 1/4
    +    p(c) = 1/4
    +    so the shannon entropy should be
    +      -p(a)*log_2(p(a)) - p(b)*log_2(p(b)) - p(c)*log_2(p(c)) =
    +      -0.5*-1 - 0.25*-2 - 0.25*-2 = 1.5
    +     */
    +    Assert.assertEquals(1.5, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "aaaaaaaaaabbbbbccccc")), 1e-1);
    --- End diff --
    
    I also could use `0.0` as the epsilon in these cases if you think that's clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by mattf-horton <gi...@git.apache.org>.
Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95462455
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 0.0);
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "")), 0.0);
    +
    +    /*
    +    Now consider the string aaaaaaaaaabbbbbccccc or 10 a's followed by 5 b's and 5 c's.
    +    The probabilities of each character is as follows:
    +    p(a) = 1/2
    +    p(b) = 1/4
    +    p(c) = 1/4
    +    so the shannon entropy should be
    +      -p(a)*log_2(p(a)) - p(b)*log_2(p(b)) - p(c)*log_2(p(c)) =
    +      -0.5*-1 - 0.25*-2 - 0.25*-2 = 1.5
    +     */
    +    Assert.assertEquals(1.5, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "aaaaaaaaaabbbbbccccc")), 0.0);
    --- End diff --
    
    Yes, the question is whether the underlying standard requires log base 2 of (1/2)^n to be exactly integral, or if that's an implementation dependency.  If it's not guaranteed by the standard, then implementation-independence would support use of the ULP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95378128
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/StringFunctions.java ---
    @@ -284,4 +287,36 @@ private static Object fill(FillDirection direction, Object inputObject, Object f
         }
         return org.apache.commons.lang.StringUtils.rightPad(input,requiredLength,fill);
       }
    +
    +  @Stellar( namespace="STRING"
    +          , name="ENTROPY"
    +          , description = "Computes the base-2 shannon entropy of a string"
    +          , params = { "input - String" }
    +          , returns = "The base-2 shannon entropy of the string.  The unit of this is bits."
    +  )
    +  public static class Entropy extends BaseStellarFunction {
    +    @Override
    +    public Object apply(List<Object> strings) {
    +      /*
    +      Shannon entropy is defined as follows:
    +      \Eta(X) = - \sum(p(x_i)*log_2(p(x_i)), i=0, n-1) where x_i are distinct characters in the string.
    +       */
    +      Map<Character, Integer> frequency = new HashMap<>();
    +      String input = (String)strings.get(0);
    --- End diff --
    
    Good catch; I did an `IllegalArgumentException`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95381847
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 1e-6);
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "")), 1e-6);
    +
    +    /*
    +    Now consider the string aaaaaaaaaabbbbbccccc or 10 a's followed by 5 b's and 5 c's.
    +    The probabilities of each character is as follows:
    +    p(a) = 1/2
    +    p(b) = 1/4
    +    p(c) = 1/4
    +    so the shannon entropy should be
    +      -p(a)*log_2(p(a)) - p(b)*log_2(p(b)) - p(c)*log_2(p(c)) =
    +      -0.5*-1 - 0.25*-2 - 0.25*-2 = 1.5
    +     */
    +    Assert.assertEquals(1.5, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "aaaaaaaaaabbbbbccccc")), 1e-1);
    --- End diff --
    
    Answering both questions around equality bounds here.  The short answer was that it was just a mistake.  I've changed both of them to `Double.MIN_VALUE`.  In those calculations, it should be as near to exact as a double precision number gets.  I would have dropped the bounds, but the non-epsilon'd `assertEquals` is deprecated.  I guess, which would you prefer, a deprecated call, an `assertTrue` or an assertEquals with the minimum possible epsilon?  I chose the final one, but if you think otherwise, I'm not married to that decision.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95375717
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/StringFunctions.java ---
    @@ -284,4 +287,36 @@ private static Object fill(FillDirection direction, Object inputObject, Object f
         }
         return org.apache.commons.lang.StringUtils.rightPad(input,requiredLength,fill);
       }
    +
    +  @Stellar( namespace="STRING"
    +          , name="ENTROPY"
    +          , description = "Computes the base-2 shannon entropy of a string"
    --- End diff --
    
    Good catch; this is, at this point, pretty classical work, so I link to the wikipedia definition rather than claude shannon's original work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95380349
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 1e-6);
    --- End diff --
    
    The specified equality delta is 1e-6, but is there a possibility that the expected return value for these cases is anything other than 0.0? I get it for imprecise double calculations, but I'm wondering if we would want to allow for that for these base cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/403


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by mattf-horton <gi...@git.apache.org>.
Github user mattf-horton commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95426043
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 0.0);
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "")), 0.0);
    +
    +    /*
    +    Now consider the string aaaaaaaaaabbbbbccccc or 10 a's followed by 5 b's and 5 c's.
    +    The probabilities of each character is as follows:
    +    p(a) = 1/2
    +    p(b) = 1/4
    +    p(c) = 1/4
    +    so the shannon entropy should be
    +      -p(a)*log_2(p(a)) - p(b)*log_2(p(b)) - p(c)*log_2(p(c)) =
    +      -0.5*-1 - 0.25*-2 - 0.25*-2 = 1.5
    +     */
    +    Assert.assertEquals(1.5, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "aaaaaaaaaabbbbbccccc")), 0.0);
    --- End diff --
    
    It does seem that an empty string should have entropy exactly == 0.0 (line 160).
    
    In line 173, I assume the argument is that a string with per-letter probabilities that are small exact powers of 2 will have its entropy calculated exactly.  I suspect that's true, but one would have to know the implementation to be sure.
    
    Otherwise one would have to use a delta of 4*Math.ulp(Double(1.5)) , I think.
    But it must pass as is, so it's good :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95430321
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/StringFunctionsTest.java ---
    @@ -32,124 +33,143 @@
     
     public class StringFunctionsTest {
     
    -    @Test
    -    public void testStringFunctions() throws Exception {
    -        final Map<String, String> variableMap = new HashMap<String, String>() {{
    -            put("foo", "casey");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -        }};
    -        Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  @Test
    +  public void testStringFunctions() throws Exception {
    +    final Map<String, String> variableMap = new HashMap<String, String>() {{
    +      put("foo", "casey");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +    }};
    +    Assert.assertTrue(runPredicate("true and TO_UPPER(foo) == 'CASEY'", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in [ TO_LOWER('CASEY'), 'david' ]", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("TO_UPPER(foo) in [ TO_UPPER('casey'), 'david' ] and IN_SUBNET(ip, '192.168.0.0/24')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("TO_LOWER(foo) in [ TO_UPPER('casey'), 'david' ]", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testStringFunctions_advanced() throws Exception {
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", "casey");
    +      put("bar", "bar.casey.grok");
    +      put("ip", "192.168.0.1");
    +      put("empty", "");
    +      put("spaced", "metron is great");
    +      put("myList", ImmutableList.of("casey", "apple", "orange"));
    +    }};
    +    Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    +    Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    +    Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +  }
    +
    +  @Test
    +  public void testLeftRightFills() throws Exception{
    +    final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    +      put("foo", null);
    +      put("bar", null);
    +      put("notInt","oh my");
    +    }};
    +
    +    //LEFT
    +    Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    +    Assert.assertNotNull(left);
    +    Assert.assertEquals(10,((String)left).length());
    +    Assert.assertEquals("XXXXXXX123",(String)left);
    +
    +    //RIGHT
    +    Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    +    Assert.assertNotNull(right);
    +    Assert.assertEquals(10,((String)right).length());
    +    Assert.assertEquals("123XXXXXXX",(String)right);
    +
    +    //INPUT ALREADY LENGTH
    +    Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    +    Assert.assertEquals(3,((String)same).length());
    +    Assert.assertEquals("123",(String)same);
    +
    +    //INPUT BIGGER THAN LENGTH
    +    Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    +    Assert.assertEquals(10,((String)tooBig).length());
    +    Assert.assertEquals("1234567890",(String)tooBig);
    +
    +    //NULL VARIABLES
    +    boolean thrown = false;
    +    try{
    +      run("FILL_RIGHT('123',foo,bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testStringFunctions_advanced() throws Exception {
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", "casey");
    -            put("bar", "bar.casey.grok");
    -            put("ip", "192.168.0.1");
    -            put("empty", "");
    -            put("spaced", "metron is great");
    -            put("myList", ImmutableList.of("casey", "apple", "orange"));
    -        }};
    -        Assert.assertTrue(runPredicate("foo in SPLIT(bar, '.')", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo in SPLIT(ip, '.')", v -> variableMap.get(v)));
    -        Assert.assertTrue(runPredicate("foo in myList", v -> variableMap.get(v)));
    -        Assert.assertFalse(runPredicate("foo not in myList", v -> variableMap.get(v)));
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL LENGTH
    +    try{
    +      run("FILL_RIGHT('123','X',bar)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
         }
    -
    -    @Test
    -    public void testLeftRightFills() throws Exception{
    -        final Map<String, Object> variableMap = new HashMap<String, Object>() {{
    -            put("foo", null);
    -            put("bar", null);
    -            put("notInt","oh my");
    -        }};
    -
    -        //LEFT
    -        Object left = run("FILL_LEFT('123','X', 10)",new HashedMap());
    -        Assert.assertNotNull(left);
    -        Assert.assertEquals(10,((String)left).length());
    -        Assert.assertEquals("XXXXXXX123",(String)left);
    -
    -        //RIGHT
    -        Object right = run("FILL_RIGHT('123','X', 10)", new HashedMap());
    -        Assert.assertNotNull(right);
    -        Assert.assertEquals(10,((String)right).length());
    -        Assert.assertEquals("123XXXXXXX",(String)right);
    -
    -        //INPUT ALREADY LENGTH
    -        Object same = run("FILL_RIGHT('123','X', 3)", new HashedMap());
    -        Assert.assertEquals(3,((String)same).length());
    -        Assert.assertEquals("123",(String)same);
    -
    -        //INPUT BIGGER THAN LENGTH
    -        Object tooBig = run("FILL_RIGHT('1234567890','X', 3)", new HashedMap());
    -        Assert.assertEquals(10,((String)tooBig).length());
    -        Assert.assertEquals("1234567890",(String)tooBig);
    -
    -        //NULL VARIABLES
    -        boolean thrown = false;
    -        try{
    -            run("FILL_RIGHT('123',foo,bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL LENGTH
    -        try{
    -            run("FILL_RIGHT('123','X',bar)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NULL FILL
    -        try{
    -            run("FILL_RIGHT('123',foo, 7)", variableMap);
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("are both required"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // NON INTEGER LENGTH
    -        try {
    -            run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        // EMPTY STRING PAD
    -        try {
    -            Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    -        }catch(ParseException pe) {
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    -        }
    -        Assert.assertTrue(thrown);
    -        thrown = false;
    -
    -        //MISSING LENGTH PARAMETER
    -        try {
    -            run("FILL_RIGHT('123',foo)", variableMap);
    -        }catch(ParseException pe){
    -            thrown = true;
    -            Assert.assertTrue(pe.getMessage().contains("expects three"));
    -        }
    -        Assert.assertTrue(thrown);
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NULL FILL
    +    try{
    +      run("FILL_RIGHT('123',foo, 7)", variableMap);
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("are both required"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // NON INTEGER LENGTH
    +    try {
    +      run("FILL_RIGHT('123','X', 'z' )", new HashedMap());
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("not a valid Integer"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    // EMPTY STRING PAD
    +    try {
    +      Object returnValue = run("FILL_RIGHT('123','', 10 )", new HashedMap());
    +    }catch(ParseException pe) {
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("cannot be an empty"));
    +    }
    +    Assert.assertTrue(thrown);
    +    thrown = false;
    +
    +    //MISSING LENGTH PARAMETER
    +    try {
    +      run("FILL_RIGHT('123',foo)", variableMap);
    +    }catch(ParseException pe){
    +      thrown = true;
    +      Assert.assertTrue(pe.getMessage().contains("expects three"));
         }
    +    Assert.assertTrue(thrown);
    +  }
    +
    +  @Test
    +  public void shannonEntropyTest() throws Exception {
    +    //test empty string
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY('')", new HashMap<>()), 0.0);
    +    Assert.assertEquals(0.0, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "")), 0.0);
    +
    +    /*
    +    Now consider the string aaaaaaaaaabbbbbccccc or 10 a's followed by 5 b's and 5 c's.
    +    The probabilities of each character is as follows:
    +    p(a) = 1/2
    +    p(b) = 1/4
    +    p(c) = 1/4
    +    so the shannon entropy should be
    +      -p(a)*log_2(p(a)) - p(b)*log_2(p(b)) - p(c)*log_2(p(c)) =
    +      -0.5*-1 - 0.25*-2 - 0.25*-2 = 1.5
    +     */
    +    Assert.assertEquals(1.5, (Double)run("STRING_ENTROPY(foo)", ImmutableMap.of("foo", "aaaaaaaaaabbbbbccccc")), 0.0);
    --- End diff --
    
    yeah, I suspect the reason why we can get away with such a small epsilon here is that the frequencies are of the form `(1/2)^n`.  I was a bit shocked it worked, frankly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95377207
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/StringFunctions.java ---
    @@ -284,4 +287,36 @@ private static Object fill(FillDirection direction, Object inputObject, Object f
         }
         return org.apache.commons.lang.StringUtils.rightPad(input,requiredLength,fill);
       }
    +
    +  @Stellar( namespace="STRING"
    +          , name="ENTROPY"
    +          , description = "Computes the base-2 shannon entropy of a string"
    +          , params = { "input - String" }
    +          , returns = "The base-2 shannon entropy of the string.  The unit of this is bits."
    +  )
    +  public static class Entropy extends BaseStellarFunction {
    +    @Override
    +    public Object apply(List<Object> strings) {
    +      /*
    +      Shannon entropy is defined as follows:
    +      \Eta(X) = - \sum(p(x_i)*log_2(p(x_i)), i=0, n-1) where x_i are distinct characters in the string.
    +       */
    +      Map<Character, Integer> frequency = new HashMap<>();
    +      String input = (String)strings.get(0);
    --- End diff --
    
    Super minor, but we should probably catch the no-arg provided NPE scenario and throw new IllegalStateException showing usage to the user when they fail to provide an arg.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #403: METRON-640: Add a Stellar function to co...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/403#discussion_r95373208
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/StringFunctions.java ---
    @@ -284,4 +287,36 @@ private static Object fill(FillDirection direction, Object inputObject, Object f
         }
         return org.apache.commons.lang.StringUtils.rightPad(input,requiredLength,fill);
       }
    +
    +  @Stellar( namespace="STRING"
    +          , name="ENTROPY"
    +          , description = "Computes the base-2 shannon entropy of a string"
    --- End diff --
    
    Can we link to the paper?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---