You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by cestella <gi...@git.apache.org> on 2018/05/21 22:10:08 UTC

[GitHub] metron pull request #1021: Stellar should have a _ special variable which re...

GitHub user cestella opened a pull request:

    https://github.com/apache/metron/pull/1021

    Stellar should have a _ special variable which returns the message in map form

    ## Contributor Comments
    In order to support functions which operate on the whole message, we should have a special variable (_, keeping with the vaguely scala theme) which can return the entire underlying message.  This map should be immutable.
    
    NOTE: I still owe proper docs in stellar-common.
    
    Manual testing can be done via the REPL, adding a field transformation to a parser and a stellar enrichment.
    
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    
    ### For code changes:
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
      ```
    
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron stellar_all_fields_constant

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1021.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1021
    
----
commit 2ef7b8ff345fcb4afd2170f66e8598eef466927b
Author: cstella <ce...@...>
Date:   2018-05-21T21:23:44Z

    Added a _ special variable for Stellar.

----


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189905091
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    I don't doubt the need or applicability, I guess I'm saying why not just have the _ in the configuration side, and just have the scripts reference the vars by name and not have to MAP_GET()?
    
    That way, the vars are the vars are the vars in Stellar, and the external configuration can have the option for putting all the message in the var namespace.
    



---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    @ottobackwards As long as the underlying map is updated in `MapVariableResolver` as per the `assign` function in that variable resolver, it should be supported without any change as the approach is to return the ConcatMap, which is backed by the real maps, which are updated.  ConcatMap is immutable, but the underlying maps that backs it aren't and assign would update those maps directly.


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r190011259
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    So, what you're trying to solve, I think, is scoped variable resolution.  I think this is a bit of a different use-case here and one that doesn't really have an analogue in programming languages at large.  This is, rather, getting a list of the variables and their values within the current scope.  For our purposes, though, we only have one scope, global scope.  I'd consider non-global scope resolution to be out of scope for this.
    
    Ultimately, what I'm arguing for is the map variable resolver to have a special variable that returns the full variable scope. 
    
    Is this becoming clearer or am I spreading more mud than I'm clearing?


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189895905
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    If you are using a `_`, then you cannot make a judgement about which fields you're using, so you have to include all fields.  If you aren't, then you can make some decision and only want to send the fields required to the adapter.  That's the basis for the either/or situation.


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r190019149
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    The PR as it stands does not make any scoping designation beyond global.  Indeed, it is pulling in ALL of the available variables.  My adjustment above adds namespacing (e.g. _ implies the message variables, `global` imples just the global config variables).


---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    This is a great feature from my perspective. I've seen a number of people try and build solutions that require access to the whole message for a bunch of custom functions, models, rules engine integrations etc, and as a simple means of doing this, I love the approach. To address some of the points here, I agree, we need to avoid any chance of copying the whole map given the performance risk with some potentially sizable and numerous objects. Assignment... yep, _ should not be allowed on the left hand side, and it's a map, so assignment to an output field shouldn't be a big issue.


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r190004639
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    Something like that, in leu of literal scopes ( which would have a hierarchy ) might work out.  
    
    I'm not trying to shoot down your idea, I just want to understand it, and we approach things differently.
    
    I think in the end we want to get to
    interface Scope {
    Scope getParent()
    Resolve()
    Get()
    Put()
    }
    
    before calling a function/lambda
    
    Scope scope = new Scope(stack.peek())
    stack.push(scope)
    call function(scope)
    stack.pop(scope)
    
    



---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    @ottobackwards I don't see it really affecting assignment.  `_` should behave just like any other variable to my mind and return a map.  For instance, if you say `new_field := _`, then I'd expect `new_field` to be a map with all of the existing fields for a message.  We should prevent `_` from being on the left-hand-side, though, I think.


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189900746
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    Well, I added it because I came across it multiple times in data science use-cases.  Specifically, the way that we do model application via MaaS (and other model applications) presumes that the user will filter just the fields into the model, rather than having the model select it.  This puts an undue burden on the user as the model knows which fields it wants and the user may not, so being able to pass the whole message lets the model do the decision.  I think that `_` is a natural way to do that.  To be fair, also, this is a capability inherent to the VariableResolver, not the grammar itself, which I thought was reasonable.  SOme variable resolvers won't support it and some will.
    
    Thoughts?


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189862557
  
    --- Diff: metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/common/utils/ConcatMap.java ---
    @@ -0,0 +1,202 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.stellar.common.utils;
    +
    +
    +import com.esotericsoftware.kryo.Kryo;
    +import com.esotericsoftware.kryo.KryoSerializable;
    +import com.esotericsoftware.kryo.io.Input;
    +import com.esotericsoftware.kryo.io.Output;
    +import com.google.common.base.Joiner;
    +import com.google.common.collect.Iterables;
    +import com.google.common.collect.Sets;
    +
    +import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    --- End diff --
    
    Can throw some java doc on this about what it is supposed to provide?  And the assumptions that it makes ( like the uniqueness of the keys in the different maps and what happens etc )


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189932382
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    The variables referenced are spread across multiple maps.  It's unreasonable to union those maps together for every message, rather passing back the ConfigMap, which is a lazy facade on Map that doesn't union the maps.  The variable resolver can choose to support `_` or not, so the support has to happen in the variable resolver.


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189898485
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    I guess I'm wondering why introduce _ to the language at all?  Why not make that a mechanic external to the language, to the configurations and the things that setup the variable resolver?
    
    



---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    Ok, just ran this up in full-dev and it works as advertised.  If this needs to change the fundamental approach, let me know, but as it is submitted it functions as advertised.


---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    [Multimaps](https://google.github.io/guava/releases/19.0/api/docs/com/google/common/collect/Multimap.html#entries()) don't have unique keys.  Making the unique keys would necessitate copying the maps, which seems excessive to me.  


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r190018295
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    My impression is that you are creating an implicit scope with the message, maybe that is where we are crossing


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189895940
  
    --- Diff: metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/common/utils/ConcatMap.java ---
    @@ -0,0 +1,202 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.stellar.common.utils;
    +
    +
    +import com.esotericsoftware.kryo.Kryo;
    +import com.esotericsoftware.kryo.KryoSerializable;
    +import com.esotericsoftware.kryo.io.Input;
    +import com.esotericsoftware.kryo.io.Output;
    +import com.google.common.base.Joiner;
    +import com.google.common.collect.Iterables;
    +import com.google.common.collect.Sets;
    +
    +import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    --- End diff --
    
    Yep, absolutely.


---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by mmiklavc <gi...@git.apache.org>.
Github user mmiklavc commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    +1 via inspection. 


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189861295
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    Why would this be an either or situation?
    What if I have a 'match' expression that uses both _ and other vars?


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metron/pull/1021


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189907609
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    or - do what you are doing, but just support named maps and use some default name for the message itself.
    
    It feels like '_' conflates the 'messaging' with the language.  ( I say feels, because obviously I'm just expressing an opinion ). 
    
    This code makes the message the namespace, but then makes you reference the namespace with get map instead of just by name... or seems to.
    
    Also, I hope some of these MAAS applications make it into Metron /contrib ;)



---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189924019
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    I'm trying to wrap my head around why you need the special handling in the variable resolver.
    If one of the vars points to a map, that is it.  You don't need to do any special dance.
    
    You can have the variable context 'builder' ( config whatever ) allow naming the message as a var with a certain name and that is it.
    
    I don't understand why all the extra stuff is required.  IE.  why the variable resolver needs to know it is a map.  It is just a var.


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189997710
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    I thought about about the variable context builder idea over lunch.  Is what you are proposing to change this so that `MapVariableResolver`, instead of taking a list of maps, we take a Map of maps.  e.g. `MapVariableResolver(Map<String, Map<String, Object>> variables)` and the resolution would check that map first?  So, for instance:
    ```
    MapVariableResolver resolver = new MapVariableResolver(ImmutableMap.of("_", message, "global", globalConfig));
    ```


---

[GitHub] metron issue #1021: METRON-1568: Stellar should have a _ special variable wh...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/1021
  
    @cestella re:assignment.  If all assignment is in the language, and _ is either or, then how can we assign to a message field when _ is active?


---

[GitHub] metron pull request #1021: METRON-1568: Stellar should have a _ special vari...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1021#discussion_r189916513
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/enrichment/handler/StellarConfig.java ---
    @@ -142,8 +143,14 @@ else if(kv.getValue() instanceof List) {
       {
     
    --- End diff --
    
    > It feels like '_' conflates the 'messaging' with the language.
    
    I hear you there, that's why support for this variable is VariableResolver specific.  You very well could have a variable resolver that does NOT support it (it's just that all of ours happen to).  I'd argue that it's not even really part of the language as it's a feature of the variable resolver rather than the parsing infrastructure.
    
    One reason why this was done as a variable is that the split/join topology requires knowledge of the fields used by inspecting the variables used in stellar (this way we send only the required fields to the individual stellar adapter workers).  I had contemplated adjusting the interface or passing along the VariableResolver in the spark context, but that didn't feel right either and it was also more complex and it mandated that VariableResolvers support `_`, which not all can do.
    
    > Also, I hope some of these MAAS applications make it into Metron /contrib ;)
    
    You will get your wish as this is the preliminary PR for one of them going into Metron.  It's actually not a MaaS model, but a semantic hash function backed by Word2Vec that fits into the `HASH()` infrastructure you and JJ created.
    
    >I'm saying why not just have the _ in the configuration side, and just have the scripts reference the vars by name and not have to MAP_GET()?
    
    So, the model scripts wouldn't reference `MAP_GET`.  I was going to wait and put out a discuss thread, but perhaps an example of what I'm contributing next will illuminate the need.  The model in question's job is to take the whole message and generate a hash from it such that messages that are similar have the same hash.  This has a similar usecase to the forensic clustering use-case that I wrote up, but it's customized to your data and does not presume the user is constructing a string.  
    
    The model itself knows about the schema becuase it's specific to your data.  For instance, if you build the model on netflow data, it'll know about netflow fields:
    * source computer/port
    * destination computer/port
    * packet count
    * byte count
    * duration
    
    Now, I need a way to pass the whole message into the `HASH()` function.  One way of doing it would be:
    `HASH( { 'ip_src_addr' : ip_src_addr, 'ip_dst_addr' : ip_dst_addr, 'ip_src_port' : ip_src_port, 'ip_dst_port' : ip_dst_port, 'packet_count' : packet_count, 'duration' : duration, 'byte_count' : byte_count}, 'SEMHASH', { 'model' : OBJECT_GET('/path/to/model.ser') })`
    
    Rather than doing that, I'd rather let the model select the relevant fields like so:
    `HASH( _ , 'SEMHASH', { 'model' : OBJECT_GET('/path/to/model.ser') })`
    
    Similar situations exist with MaaS models as well, where the model knows which fields it cares about and the translation as the number of fields being input can become onerous to the user.
    
    What do you think?  Do you like another option that would solve the issue?
    
    PS. You'll get a full PR with a worked use-case on the Los Alamos National Labs data for the semantic hashing function I teased by end of week.


---