You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/03/30 20:51:06 UTC

[GitHub] [nifi] exceptionfactory opened a new pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

exceptionfactory opened a new pull request #5918:
URL: https://github.com/apache/nifi/pull/5918


   #### Description of PR
   
   NIFI-9850 Adds support for multiple expressions to `GrokReader`.
   
   The implementation updates the existing `Grok Expression` property with a new display name of `Grok Expressions` and enhances the property descriptor to support reading multiple expressions from a resource reference. The resource reference enables reading Grok Expressions from the property value itself as well as an external file path or URL. This approach maintains compatibility with existing property values that define a single Grok Expression.
   
   The updated `Grok Expressions` property descriptor enables `GrokReader` to load multiple expressions, separated by newlines. The `GrokRecordReader` iterates over the configured expressions and returns on the first match found.
   
   Changes include unit test optimization as well as a new unit test that configures multiple patterns to parse different log lines.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [X] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [X] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [X] Has your PR been rebased against the latest commit within the target branch (typically `main`)?
   
   - [X] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [X] Have you written or updated unit tests to verify your changes?
   - [X] Have you verified that the full build is successful on JDK 8?
   - [ ] Have you verified that the full build is successful on JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] ottobackwards commented on pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
ottobackwards commented on pull request #5918:
URL: https://github.com/apache/nifi/pull/5918#issuecomment-1086687253


   @exceptionfactory if you could spare some time to look at https://issues.apache.org/jira/browse/NIFI-9863 and maybe suggest any ideas or corrects, that would be super cool


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] exceptionfactory commented on pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on pull request #5918:
URL: https://github.com/apache/nifi/pull/5918#issuecomment-1085929525


   Thanks for the reply! The current documentation for the `STRING_FIELDS_FROM_GROK_EXPRESSION` strategy notes that the fields will be derived from the Grok Expression, so I will update it to mention that this includes fields from all configured expressions.
   
   As far as the matching pattern, logging the match could be verbose given the potential volume of information processed. Although this might make sense at the trace level, it could still result in hundreds or thousands of log per FlowFile. It seems better to avoid logging for now, and consider a separate task to optionally include the matching pattern index on the record.
   
   Given the different schema strategies and the ability to configure different behavior when there are no matching expressions, it does not seem like it should be an error if the patterns do not produce something matching the schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] exceptionfactory commented on pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on pull request #5918:
URL: https://github.com/apache/nifi/pull/5918#issuecomment-1084765013


   Thanks for the feedback @ottobackwards!
   
   In response to your questions:
   
   1. Yes, the implicit schema handling remains the same. The `createRecordSchema` method evaluates fields from all configured Grok Expressions
   2. Implementing a Controller Service for sharing Grok Expression resources could be useful, particularly when paired with custom Grok Patterns. There is an open issue to make the custom patterns more configurable as well. This particular set of changes is more of an incremental improvement, so a well-designed Controller Service seems like a potential follow on approach
   3. The challenge of tracking the matching pattern is that different records could match different patterns. Perhaps this could be achieve through an additional record field, but it may not be necessary in all cases. Perhaps this could also be an additional feature?
   
   In the process of evaluating this change, there are some opportunities for additional improvements, although it seems better to focus those in separate tasks. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] ottobackwards commented on pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
ottobackwards commented on pull request #5918:
URL: https://github.com/apache/nifi/pull/5918#issuecomment-1085761980


   Also, should it be an error if any of the grok patterns do not produce something matching the schema during validation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] ottobackwards commented on pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
ottobackwards commented on pull request #5918:
URL: https://github.com/apache/nifi/pull/5918#issuecomment-1085761148


   for:
   
   1.  I think that should be mentioned in the property documentation explicitly
   2.  I think that it is fine to do this as a follow on
   3.  Maybe we can log the matching pattern or something to start?  At some debug level


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] ottobackwards merged pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
ottobackwards merged pull request #5918:
URL: https://github.com/apache/nifi/pull/5918


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] exceptionfactory commented on pull request #5918: NIFI-9850 Add support for multiple expressions to GrokReader

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on pull request #5918:
URL: https://github.com/apache/nifi/pull/5918#issuecomment-1085985509


   I updated the schema strategy @ottobackwards, please let me know if you have any additional feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org