You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by simonellistonball <gi...@git.apache.org> on 2017/12/07 19:49:34 UTC

[GitHub] metron pull request #861: Implemented SELECT transformer to project fields f...

GitHub user simonellistonball opened a pull request:

    https://github.com/apache/metron/pull/861

    Implemented SELECT transformer to project fields from parser

    ## Contributor Comments
    This is a simple PR to add FieldTransformation capabilities to the parsers allowing basic projection. There are definitely better ways to implement this as reflected in the comments, but those would need a wide ranging refactor of FieldTransformations, so I've decided to shelve that for now. 
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). 
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    
    ### For code changes:
    - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [x] Have you included steps or a guide to how the change may be verified and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && build_utils/verify_licenses.sh 
      ```
    
    - [x] Have you written or updated unit tests and or integration tests to verify your changes?
    - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/simonellistonball/metron METRON-1341

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #861
    
----
commit 7674c7283b5cefd2a40bb1329fc227e0b3a2227d
Author: Simon Elliston Ball <si...@simonellistonball.com>
Date:   2017-12-05T01:07:46Z

    Implemented SELECT transformer to project fields from parser

----


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155645303
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    And order doesn't really matter, you might have this first, or last, or sandwiched between two stellar blocks. That said, I would expect the most common use case to be have this one at the end of a set.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155698109
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    agreed, and added.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155664254
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    It should be documented then in the readme


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155641855
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/field/transformation/SelectTransformationTest.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.metron.common.configuration.FieldTransformer;
    +import org.apache.metron.common.configuration.SensorParserConfig;
    +import org.apache.metron.stellar.dsl.Context;
    +import org.json.simple.JSONObject;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import com.google.common.collect.Iterables;
    +
    --- End diff --
    
    that feels a lot more like an integration test, or a broader test than belongs in this one piece. Otherwise every single test of transformations is going to end up with combinatorial complexity.


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Sure, that makes sense.  I am just trying to wrap my head around why I would use this based on the data I would send through.  I don't have to understand it for it to be useful ;)
    
    Do you have a use case or scenario that would help me?



---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Yeah, this is good.  +1 by inspection.


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Thank you for the contribution @simonellistonball, please be sure to assign and close METRON-1341 in jira.  Cheers! #ynwa


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    A typical case might be something like the CEF parser. You could potentially kick out a lot of fields you really don't care about, which at scale can produce huge amounts of ES and HDFS storage (in addition to the original_string representation. The goal for this is to focus on just outputting fields which match active use cases in the rest of the flow to control data storage costs and data clarity.
    
    This also allows you to map an explicit data model and provide some governability to the data model.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155645129
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    This is the opposite of remove.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155630936
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/field/transformation/SelectTransformation.java ---
    @@ -0,0 +1,52 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +
    +import org.apache.metron.stellar.dsl.Context;
    +
    --- End diff --
    
    Missing Javadoc.  The purpose and an example output of this transformation would be nice


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155645013
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/field/transformation/SelectTransformationTest.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.metron.common.configuration.FieldTransformer;
    +import org.apache.metron.common.configuration.SensorParserConfig;
    +import org.apache.metron.stellar.dsl.Context;
    +import org.json.simple.JSONObject;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import com.google.common.collect.Iterables;
    +
    --- End diff --
    
    or a FieldTransformer ( level up ) test more likely.



---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    It suddenly occurs to me that we should probably whitelist the original_string and timestamp fields, so that these are always kept by this transformation. Does that make sense?


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    OK,
    Re-ran my scenario after the latest changes, and verified in kibana that only the msg field was indexed ( except for metron system or added fields ).
    
    +1 


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    If they want to select most, and remove the ones they don't want, then I would recommend using the remove transformation, or a set null in stellar. Perhaps regex support might be a nice follow on, but it breaks the mental model of people who are used to any other language that handles projection, such as SQL. The goal for this was to allow user to explicitly select only a defined set of fields. To be honest, people have lived with the idea of explicitly choosing fields for decades in SQL and quite liked it, so I suspect adding something that is pattern based might make it less usable than more.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/metron/pull/861


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155649024
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    Yes, that would break. I'd say that was flat out user error, and not something to guard against, or indeed something it's possible to guard against. I guess the key thing here is that it requires maintaining order in field transformations.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155631654
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/field/transformation/SelectTransformationTest.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.metron.common.configuration.FieldTransformer;
    +import org.apache.metron.common.configuration.SensorParserConfig;
    +import org.apache.metron.stellar.dsl.Context;
    +import org.json.simple.JSONObject;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import com.google.common.collect.Iterables;
    +
    --- End diff --
    
    Do we at a test where we have multiple transformers?
    STELLAR , SELECT etc?


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Right, that should cover it.


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Good catch, my test included the source.type in the select list, and generalised badly in the instructions. We should include the source.type in the protected system fields, let me update the test and implementation.


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    I trying to run this up in full dev.  I have two issues:
    
    1.  expressJS is failing download, obviously outside this PR
    2. The usability of this transform.  If a user just wants a couple of fields, then no problem, but If they want anything more than that, then selecting all the fields that they want seems like a bit of a chore.  So if they want to select *most* of the fields, they have to list them all out.   It seems tough.  I wonder if we don't need regex support or something.  
    
    Thoughts?


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155648096
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/field/transformation/SelectTransformationTest.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.metron.common.configuration.FieldTransformer;
    +import org.apache.metron.common.configuration.SensorParserConfig;
    +import org.apache.metron.stellar.dsl.Context;
    +import org.json.simple.JSONObject;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import com.google.common.collect.Iterables;
    +
    --- End diff --
    
    That test would belong in this ticket



---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155646268
  
    --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/field/transformation/SelectTransformation.java ---
    @@ -0,0 +1,52 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Map.Entry;
    +
    +import org.apache.metron.stellar.dsl.Context;
    +
    --- End diff --
    
    I've added to relevant readme (in parsers docs) which would seem a more likely place than javadoc (if we want to add javadoc, which we don't publish anyway, we should also look at adding this to everything in this area, which feels out of scope). Not even the stellar version gets javadoc!


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155659293
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/field/transformation/SelectTransformationTest.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.metron.common.configuration.FieldTransformer;
    +import org.apache.metron.common.configuration.SensorParserConfig;
    +import org.apache.metron.stellar.dsl.Context;
    +import org.json.simple.JSONObject;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import com.google.common.collect.Iterables;
    +
    --- End diff --
    
    It's a pretty broad reaching concern when you account for the impact of remove, stellar, and the other field transforms. I'm happy to look at it, but I expect it will just need to be reviewed again as part of a follow on.


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    I ran the following test:
    Modified the default snort parser configuration such that it was :
    
    ```json
    {
      "parserClassName":"org.apache.metron.parsers.snort.BasicSnortParser",
      "sensorTopic":"snort",
      "parserConfig": {},
      "fieldTransformations" : [
        {
          "output" : ["msg" ],
          "transformation" : "SELECT"
        }
      ]
    }
    
    ```
    
    And the default snort enrichment configuration such that it was :
    
    ```json
    
    {
      "enrichment" : {
      },
      "threatIntel" : {
        }
      }
    }
    
    ```
    
    I got the following:
    
    ```
    2.168.138.158,49189,62.75.195.236,80,00:00:00:00:00:00,00:00:00:00:00:00,0x3C,***A****,0x9DFB1927,0xF1BD72CC,,0xFAF0,128,0,2360,40,40960,,,,","enrichmentsplitterbolt.splitter.end.ts":"1512763453749","enrichmentsplitterbolt.splitter.begin.ts":"1512763453749","guid":"08a84757-bf05-431b-9d81-5fa95fb99938","timestamp":1512763452000}
    	at org.apache.metron.enrichment.bolt.EnrichmentJoinBolt.getStreamIds(EnrichmentJoinBolt.java:53) ~[stormjar.jar:?]
    	at org.apache.metron.enrichment.bolt.EnrichmentJoinBolt.getStreamIds(EnrichmentJoinBolt.java:33) ~[stormjar.jar:?]
    	at org.apache.metron.enrichment.bolt.JoinBolt.execute(JoinBolt.java:138) [stormjar.jar:?]
    	at org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) [storm-core-1.0.1.2.5.3.0-37.jar:1.0.1.2.5.3.0-37]
    	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    2017-12-08 20:04:17.171 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Trying to retrieve a field map with sensor type of null
    ```
    
    So it looks like there are more fields to protect.



---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155646423
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    If a user has a stellar transform (existing ) that takes field X as input, but doesn't want X as output, and puts the SELECT in the wrong order, could they break their existing transform because X now missing?


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155645745
  
    --- Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/field/transformation/SelectTransformationTest.java ---
    @@ -0,0 +1,75 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.metron.common.field.transformation;
    +
    +import java.util.HashMap;
    +
    +import org.apache.hadoop.hbase.util.Bytes;
    +import org.apache.metron.common.configuration.FieldTransformer;
    +import org.apache.metron.common.configuration.SensorParserConfig;
    +import org.apache.metron.stellar.dsl.Context;
    +import org.json.simple.JSONObject;
    +import org.junit.Assert;
    +import org.junit.Test;
    +
    +import com.google.common.collect.Iterables;
    +
    --- End diff --
    
    yeah, makes sense, I suspect on a different ticket.


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    That is an excellent catch @simonellistonball, a test to go with it too ;)


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Actually, to follow up on that... I have a proxy feed, and some proxy use cases (enrichment, profile, etc). I want to keep my data clean and be explicit about which fields I pass on, so I have a 'library' field set that means "proxy like stuff", these are the fields I push into the output here. 


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Thanks Simon. Before a code review, a couple of questions:
    
    I see a check next to steps to verify manually, but I don't see the steps.
    
    This is also missing documentation.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155644387
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    How does this relate to REMOVE?  If I have more than one transformation should this be last or does it not matter?


---

[GitHub] metron issue #861: METRON-1341 Implemented SELECT transformer to project fie...

Posted by simonellistonball <gi...@git.apache.org>.
Github user simonellistonball commented on the issue:

    https://github.com/apache/metron/pull/861
  
    Somehow I knew you were going to say something about docs.... will add.


---

[GitHub] metron pull request #861: METRON-1341 Implemented SELECT transformer to proj...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/metron/pull/861#discussion_r155664814
  
    --- Diff: metron-platform/metron-parsers/README.md ---
    @@ -216,6 +216,23 @@ whenever `field2` exists and whose corresponding equal to 'foo':
     }
     ```
     
    +* `SELECT`: This transformation filters the fields in the message to include only the configured output fields, and drops any not explicitly included. 
    +
    +For example: 
    +```
    +{
    +...
    +    "fieldTransformations" : [
    +          {
    +            "output" : ["field1", "field2" ] 
    +          , "transformation" : "SELECT"
    +          }
    +                      ]
    +}
    +```
    +
    +when applied to a message containing keys field1, field2 and field3, will only output the first two.
    +
    --- End diff --
    
    We should document that the system fields are protected as well


---