You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by justinleet <gi...@git.apache.org> on 2017/04/03 13:26:52 UTC

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

GitHub user justinleet opened a pull request:

    https://github.com/apache/incubator-metron/pull/505

    METRON-817: Customise output file path patterns for HDFS indexing

    ## Contributor Comments
    Primarily this affects HdfsWriter by changing the output path from a set path (`/apps/metron/.../<sensor>`), and allow it to be defined via a Stellar Function.  Specifically, the base path is still defined the same (The `/apps/metron/.../` portion), but the `<sensor>` portion is dropped and can now be defined by a Stellar function.  By default, the original behavior of `<sensor>` is used.  This is defined in the `<sensor>.json` file as indicated in the new README.md for metron-writer.
    
    ### Notes
    - This requires adding tracking things a bit more carefully (and if you're reviewing, please validate that it happens correctly).  When the outputFile is closed, we remove the sourceHandler from HdfsWriter's map.
      - I'm slightly concerned about the correctness of the implementation, but it seems necessary to ensure that we don't leave a bunch of SourceHandlers lying around as data changes (and we don't want an enormous number of output files being written to).
      - If there's a cleaner way to manage this, I'd love to hear it and can refactor pretty easily. It throws off the rotation count (because we kill the SourceHandler from the map itself), but I doubt we care about that since it really only shows up in the output filename anyway.
    - This also adds an argument for max open files.  This is a flux level config. I defaulted this to 500.  500 was chosen because it was an arbitrary round number that wasn't enormous.
      - If someone has a default with any real reasoning behind it, I'll go ahead and change it.
    - In HdfsWriter, we iterate through the messages, apply the Stellar function and then call the relevant handler. The entire group of message is treated as one single pass/fail (which is the same as the old behavior), rather than individually. The try/catch could potentially be moved into the for loop, but I don't think there's an explicit link between the message and the tuples that we can exploit to fail per message.  I don't think it needs to be addressed here, but I'm curious if there's thought on this.
    
    ### Testing
    Unit tests are added to pretty much cover HdfsWriter, and this can be spun up in a dev environment.
    
    To test in dev
    
    - Spin up a dev environment
    - Validate that the output matches the old format in HDFS (Nothing has an output function defined)
      ```
      [hdfs@node1 vagrant]$ hdfs dfs -ls /apps/metron/indexing/indexed/
      Found 3 items
      drwxrwxr-x   - storm hadoop          0 2017-04-03 13:11 /apps/metron/indexing/indexed/bro
      drwxrwxr-x   - storm hadoop          0 2017-04-03 13:11 /apps/metron/indexing/indexed/error
      drwxrwxr-x   - storm hadoop          0 2017-04-03 13:11 /apps/metron/indexing/indexed/snort
      ```
    - Edit the indexing config for Bro to include an outputPathFunction in the hdfs section, e.g. in `/usr/metron/0.3.1/config/zookeeper/indexing/bro.json`
      ```
      {
        "hdfs" : {
          "index": "bro",
          "batchSize": 5,
          "enabled" : true,
          "outputPathFunction": "FORMAT('ipsrc-%s', ip_src_addr)"
        },
        "elasticsearch" : {
          "index": "bro",
          "batchSize": 5,
          "enabled" : true
        },
        "solr" : {
          "index": "bro",
          "batchSize": 5,
          "enabled" : true
        }
      }
      ```
    - Push the config configs to ZooKeeper: `/usr/metron/0.3.1/bin/zk_load_configs.sh -z node1:2181 -m PUSH -i /usr/metron/0.3.1/config/zookeeper/`
    - Let some more data run through and check the output folders, e.g.
      ```
    [hdfs@node1 vagrant]$ hdfs dfs -ls /apps/metron/indexing/indexed/
    Found 5 items
    drwxrwxr-x   - storm hadoop          0 2017-04-03 13:11 /apps/metron/indexing/indexed/bro
    drwxrwxr-x   - storm hadoop          0 2017-04-03 13:11 /apps/metron/indexing/indexed/error
    drwxrwxr-x   - storm hadoop          0 2017-04-03 13:14 /apps/metron/indexing/indexed/ipsrc-192.168.138.158
    drwxrwxr-x   - storm hadoop          0 2017-04-03 13:14 /apps/metron/indexing/indexed/ipsrc-192.168.66.1
    drwxrwxr-x   - storm hadoop          0 2017-04-03 13:11 /apps/metron/indexing/indexed/snort
    [hdfs@node1 vagrant]$ hdfs dfs -ls /apps/metron/indexing/indexed/ipsrc-192.168.138.158
    Found 1 items
    -rw-r--r--   1 storm hadoop     223182 2017-04-03 13:14 /apps/metron/indexing/indexed/ipsrc-192.168.138.158/enrichment-null-0-0-1491225291377.json
      ```
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron (Incubating).  
    Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). 
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    
    ### For code changes:
    - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
    - [x] Have you included steps or a guide to how the change may be verified and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been executed in the root incubating-metron folder via:
      ```
      mvn -q clean integration-test install && build_utils/verify_licenses.sh 
      ```
    
    - [x] Have you written or updated unit tests and or integration tests to verify your changes?
    - ~If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?~
    - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [x] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`:
    
      ```
      cd site-book
      bin/generate-md.sh
      mvn site:site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    It is also recommened that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/justinleet/incubator-metron hdfs_path

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #505
    
----
commit e84c1393c293a809372386fdcf374f0c3dc50d9c
Author: justinjleet <ju...@gmail.com>
Date:   2017-03-31T03:34:21Z

    Allowing for message guided output and adding doc

commit 9762cb6c473c6c11da70963f8de1e24f6f7b0502
Author: justinjleet <ju...@gmail.com>
Date:   2017-04-03T11:41:25Z

    renamed parameters in override method for clarity, added test around SourceFileNameFormat to ensure additions work

commit 6693740a85a5cd9d90ac4a413e2037410f927432
Author: justinjleet <ju...@gmail.com>
Date:   2017-04-03T11:47:57Z

    Removing extraneous json change, and not tripping rat

commit 72348501c4f7876aaaa7930f600bf68bc98a0d61
Author: justinjleet <ju...@gmail.com>
Date:   2017-04-03T12:12:10Z

    Adjusting output

commit 8b19e67c10821b12c8e09439fc7c89528372df77
Author: justinjleet <ju...@gmail.com>
Date:   2017-04-03T12:19:13Z

    README adjustment

commit 10b9c52ad373e395f612a8253385d6e9783cb09e
Author: justinjleet <ju...@gmail.com>
Date:   2017-04-03T12:22:22Z

    Renaming SourceFileNameFormat, cleaning up a couple minor annoyances in SourceAwareMoveAction

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109447598
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    @cestella Made that change.  I did make the check `if(objResult != null && !(objResult instanceof String)`, to avoid having falling into the IAE when objResult is null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109438625
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    @cestella I'm mostly concerned about the performance of function compile on every single message that comes through indexing.
    
    If we keep the current approach, I would be interested in if there's a way to make things a little cleaner.
    
    In retrospect, I think this should be an LRU cache, so that we don't keep around a given parse forever. Any thoughts on that, assuming performance would be enough of a concern to not just use your proposal?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by justinleet <gi...@git.apache.org>.
Github user justinleet commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109432502
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    Unfortunately, I don't think we can, unless we want to do more work to actually look up the function and validate. On top of it, things like MAP_GET essentially return Object anyway, so we'd still want to check if it's a String afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109428017
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    We should be able to find out from the function metadata/annotation the return type, without doing all this work shouldn't we?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109441372
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    Yeah, it's a good concern.  We *do* actually have a cache in the `StellarProcessor` so that compilations happen once and are cached afterwards.  As long as `StellarProcessor` is a transient member variable, I think you're good to do what I suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by ottobackwards <gi...@git.apache.org>.
Github user ottobackwards commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109428569
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    This makes me think of the UI case.  We configure the index configuration but have no way of validation before they save and deploy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109433891
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    So, is there a reason why this isn't just:
    ```
    //processor is a StellarProcessor();
    VariableResolver resolver = new MapVariableResolver(message);
    Object objResult = processor.parse(stellarFunction, resolver, StellarFunctions.FUNCTION_RESOLVER(), Context.EMPTY_CONTEXT());
    if(!objResult instanceof String) {
      throw new IllegalArgumentException("Stellar Function <" + stellarFunction + "> did not return a String value. Returned: " + objResult);
    }
    return objResult == null?"":(String)objResult;
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on a diff in the pull request:

    https://github.com/apache/incubator-metron/pull/505#discussion_r109467330
  
    --- Diff: metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/HdfsWriter.java ---
    @@ -74,17 +91,43 @@ public BulkWriterResponse write(String sourceType
                        ) throws Exception
       {
         BulkWriterResponse response = new BulkWriterResponse();
    -    SourceHandler handler = getSourceHandler(configurations.getIndex(sourceType));
    +    // Currently treating all the messages in a group for pass/failure.
         try {
    -      handler.handle(messages);
    -    } catch(Exception e) {
    +      // Messages can all result in different HDFS paths, because of Stellar Expressions, so we'll need to iterate through
    +      for(JSONObject message : messages) {
    +        Map<String, Object> val = configurations.getSensorConfig(sourceType);
    +        String path = getHdfsPathExtension(
    +                sourceType,
    +                (String)configurations.getSensorConfig(sourceType).getOrDefault(IndexingConfigurations.OUTPUT_PATH_FUNCTION_CONF, ""),
    +                message
    +        );
    +        SourceHandler handler = getSourceHandler(sourceType, path);
    +        handler.handle(message);
    +      }
    +    } catch (Exception e) {
           response.addAllErrors(e, tuples);
         }
     
         response.addAllSuccesses(tuples);
         return response;
       }
     
    +  public String getHdfsPathExtension(String sourceType, String stellarFunction, JSONObject message) {
    +    // If no function is provided, just use the sourceType directly
    +    if(stellarFunction == null || stellarFunction.trim().isEmpty()) {
    +      return sourceType;
    +    }
    +
    +    StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction));
    +    VariableResolver resolver = new MapVariableResolver(message);
    --- End diff --
    
    After looking at this a bit further, while reusing the StellarProcessor *is* the right answer, it is apparent that we don't practice that everywhere...in fact, we practice it almost literally nowhere.  I have created a follow-on PR ( #508 ) to address that problem, which is a substantial performance issue, in fact.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron issue #505: METRON-817: Customise output file path patterns...

Posted by cestella <gi...@git.apache.org>.
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/505
  
    +1 by inspection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-metron pull request #505: METRON-817: Customise output file path p...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/505


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---