You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by selim-namsi <gi...@git.apache.org> on 2016/10/05 17:18:36 UTC

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

GitHub user selim-namsi opened a pull request:

    https://github.com/apache/nifi/pull/1108

    NIFI-2565: add Grok parser

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/selim-namsi/nifi nifi-2565

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1108
    
----
commit 447c65ec272fd72b8b55a015f36449e387400fe6
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-05T16:54:37Z

    nifi-2565: add Grok parser

commit 1dffac47057c85c7aba2e0b2a8543eafc88e96be
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-05T17:17:49Z

    nifi-2656: Update LICENSE after adding Grok Parser

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    You can certainly include a file with the default patterns, however you should not hardcode them. By hardcoding them you prevent the user from optimising for speed by removing unused patterns from the pattern files (I realise they can remove from the default packaged patterns but that means they would be changing packaged files after an install, something you should always avoid) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83229592
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestGrokParser/patterns ---
    @@ -0,0 +1,108 @@
    +# Forked from https://github.com/elasticsearch/logstash/tree/v1.4.0/patterns
    --- End diff --
    
    We have to ensure that we have proper licensing for these test files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @markap14 Thanks for all this suggestions, I'll update the code ASAP and push the changes!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
GitHub user selim-namsi reopened a pull request:

    https://github.com/apache/nifi/pull/1108

    NIFI-2565: add Grok parser

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/selim-namsi/nifi nifi-2565

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1108.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1108
    
----
commit 447c65ec272fd72b8b55a015f36449e387400fe6
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-05T16:54:37Z

    nifi-2565: add Grok parser

commit 1dffac47057c85c7aba2e0b2a8543eafc88e96be
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-05T17:17:49Z

    nifi-2656: Update LICENSE after adding Grok Parser

commit c764f83506165c99e77c707e358cb327e9af16b8
Author: Scott Aslan <sc...@gmail.com>
Date:   2016-10-05T15:46:48Z

    [NIFI-1904] If open, close hamburger menu on window resize. This closes #1106

commit e46fea920af11d2527f4eb4da3a191e309d12618
Author: Scott Aslan <sc...@gmail.com>
Date:   2016-10-04T18:13:17Z

    [NIFI-1794] Update .dialog-content to wrap text. This closes #1094

commit da33e2859ce45321d28901e5820c38a37dcfc709
Author: mans2singh <ma...@yahoo.com>
Date:   2016-07-24T23:18:55Z

    NIFI-2398 - GetIgnite processor
    
    This closes #721.

commit 6ad633d17422f2110645553cb03ae3b364926eee
Author: Scott Aslan <sc...@gmail.com>
Date:   2016-10-05T20:00:34Z

    [NIFI-2838] update width of rule name and save message. This closes #1089

commit 6f1af31ff28f60d0eddbee5dafe909bc66cc9c71
Author: Joe N <jo...@gmail.com>
Date:   2016-10-03T22:16:14Z

    NIFI-2852 base64 expression language functions
    
    Signed-off-by: jpercivall <jo...@yahoo.com>

commit 3c8545a90266b8f82a6c541c9a68daad107b0f23
Author: Pierre Villard <pi...@gmail.com>
Date:   2016-09-29T20:47:04Z

    NIFI-1912 - PutEmail fixed format when attachment
    
    Correction as suggested by users in JIRA.
    + adding a unit test to check attachments.
    
    This closes: #1081
    
    Signed-off-by: Andre F de Miranda <tr...@users.noreply.github.com>

commit 3c673972e035f1168e509128150444da78af5292
Author: Matt Gilman <ma...@gmail.com>
Date:   2016-10-06T13:29:59Z

    NIFI-2816:
    - Fixing compilation error resulting from the initial NIFI-2816 commit.

commit 8bd85e20853c44bbb33bcad2795bfb4ac8819e1a
Author: Scott Aslan <sc...@gmail.com>
Date:   2016-10-05T20:17:32Z

    [NIFI-1792] Clear the selected rule id when deleting the last rule in the list. Add scrollable styles when appropriate. Close popups when appropriate. This PR also adjusts the position of the table cell nfel and long text editors. This closes #1099.

commit 92cca96d49042f9898f93b3a2d2210b924708e52
Author: Mark Payne <ma...@hotmail.com>
Date:   2016-09-08T23:37:35Z

    NIFI-2865: Refactored PublishKafka and PublishKafka_0_10 to allow batching of FlowFiles within a single publish and to let messages timeout if not acknowledged
    
    This closes #1097.
    
    Signed-off-by: Bryan Bende <bb...@apache.org>

commit 53f7a2166360de4f73a2fefbb0e6b6349ba92455
Author: Andrew Lim <an...@gmail.com>
Date:   2016-10-05T15:27:52Z

    NIFI-2691 Replaced references to kerberos/spegno principle with principal in nifi.properties and admin guide
    
    This closes: #1105
    
    Signed-off-by: Andre F de Miranda <tr...@users.noreply.github.com>

commit a4ed622152187155463af2b748c9bf492621bbc7
Author: Bryan Bende <bb...@apache.org>
Date:   2016-10-06T19:19:00Z

    Revert "NIFI-2865: Refactored PublishKafka and PublishKafka_0_10 to allow batching of FlowFiles within a single publish and to let messages timeout if not acknowledged"
    
    This reverts commit 92cca96d49042f9898f93b3a2d2210b924708e52.

commit b9cb6b1b475eb4688b7cd32f6d343c5dffb20567
Author: Mark Payne <ma...@hotmail.com>
Date:   2016-09-08T23:37:35Z

    NIFI-2865: Refactored PublishKafka and PublishKafka_0_10 to allow batching of FlowFiles within a single publish and to let messages timeout if not acknowledged
    
    Signed-off-by: Bryan Bende <bb...@apache.org>

commit 9304df4de060335526d29a77aa093db4004c8b2e
Author: Mark Payne <ma...@hotmail.com>
Date:   2016-10-06T19:39:24Z

    NIFI-2865: Fixed bug in StreamDemarcator that is exposed when the final bit of data in a stream is smaller than the previous and the previous demarcation ended on a buffer length boundary
    
    This closes #1110.
    
    Signed-off-by: Bryan Bende <bb...@apache.org>

commit bb6c5d9d4ea02433ee8cb63cd142327c02f6e9da
Author: Matt Gilman <ma...@gmail.com>
Date:   2016-10-04T18:52:18Z

    NIFI-2777:
    NIFI-2856:
    - Only performing response merging when the node is the cluster cooridinator even if there is a single response.
    - Fixing PropertyDescriptor merging to ensure the 'choosen' descriptor is included in map of all responses.
    
    This closes #1095.

commit 09568d092b5329e4732f2e05c10fad181c344b8d
Author: Mark Payne <ma...@hotmail.com>
Date:   2016-10-05T20:19:55Z

    NIFI-2836:
    - Ensure that we wait until a request is completed before unlocking the lock for request replication
    - Ensure that failures do not trigger request completion logic unless the failure is the last node to report its status
    - This closes #1109

commit 540ef63efa9d6b2af84f57fe4eae2f08e6dd1693
Author: Koji Kawamura <ij...@apache.org>
Date:   2016-10-04T10:39:36Z

    NIFI-2855: Site-to-Site with port forwarding.
    
    - Added following properties:
      - nifi.web.http.port.forwarding
      - nifi.web.https.port.forwarding
    
    This closes #1100.
    
    Signed-off-by: Koji Kawamura <ij...@apache.org>

commit 69a90bc305626bfcf306c3364d5888e2e3b3abac
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-05T16:54:37Z

    nifi-2565: add Grok parser

commit 96d5c7f825e959fba72d5aa70483c191af161edc
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-05T17:17:49Z

    nifi-2656: Update LICENSE after adding Grok Parser

commit 4158cf04873f91f55a9bb7722c0e8d44ccb0bdc8
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-08T17:25:12Z

    nifi-2565: remove grok patterns file + refactoring

commit 5e4be7fe43c777ef102a3e3fcebd3d7dd7d3a7d3
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-08T17:36:46Z

    merge

commit 544bfa99c8bd1fb5f9061c89355670e031256a5c
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-08T17:42:07Z

    nifi-2565: Update LICENSE File

commit f81cd852fd01a51b4b93a2b150669204e03ebd67
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-08T19:40:18Z

    nifi-2565: exclude test files for Licence checking

commit 201e7ecdc7ec92edd1df40981b89dfad3cc994e2
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-08T19:41:17Z

    nifi-2565: add apache licence header

commit 5cb2a314c8976951d00bf459ad9f152797b08964
Author: Selim Namsi <se...@gmail.com>
Date:   2016-10-08T19:41:17Z

    nifi-2565: add apache licence header

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83226308
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    --- End diff --
    
    The naming convention that we try to stick with for Processors is <Verb><Noun>. While this may be counter-intuitive for a Java Developer, it results in making the flow much more readable for users. So we should consider ParseLog or GrokLog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r87909770
  
    --- Diff: nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/StandardValidators.java ---
    @@ -26,6 +26,8 @@
     import java.util.concurrent.TimeUnit;
     import java.util.regex.Pattern;
     
    +import oi.thekraken.grok.api.Grok;
    --- End diff --
    
    This validation roytine should not be added to standard validators in order to avoud impirting grok into the standard validator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83227140
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    --- End diff --
    
    If this is not required, how will the processor work if not set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83720911
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    --- End diff --
    
    @markap14  In the first version of the code, I was loading few useful pattern files by default, so the user's custom pattern file was not required, but after removing that part I forgot to update the required attribute, I'll fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r84477121
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    --- End diff --
    
    @selim-namsi 
    
    Perhaps you can use the description used as part of my WIP.
    
    > "Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content"
    
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @trixpan @markap14 I pushed the new changes.
    Could you please check the changes ?
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r96236367
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractGrok.java ---
    @@ -0,0 +1,298 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import io.thekraken.grok.api.Grok;
    +import io.thekraken.grok.api.Match;
    +import io.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.annotation.lifecycle.OnStopped;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.components.ValidationContext;
    +import org.apache.nifi.components.ValidationResult;
    +import org.apache.nifi.components.Validator;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import org.apache.nifi.util.StopWatch;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.concurrent.BlockingQueue;
    +import java.util.concurrent.LinkedBlockingQueue;
    +import java.util.concurrent.TimeUnit;
    +
    +
    +@Tags({"Grok Processor", "grok", "log", "text", "parse", "delimit", "extract"})
    --- End diff --
    
    "Grok Processor" looks a bit out of place but should not prevent merge. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83229335
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java ---
    @@ -0,0 +1,104 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +
    +import org.apache.nifi.util.MockFlowFile;
    +import org.apache.nifi.util.TestRunner;
    +import org.apache.nifi.util.TestRunners;
    +import org.junit.Before;
    +import org.junit.Test;
    +
    +import java.io.IOException;
    +import java.nio.file.Path;
    +import java.nio.file.Paths;
    +
    +/**
    + * Created by snamsi on 05/10/16.
    + */
    +public class TestGrokParser {
    +
    +    private TestRunner testRunner;
    +    final static Path GROK_LOG_INPUT = Paths.get("src/test/resources/TestGrokParser/apache.log");
    +    final static Path GROK_TEXT_INPUT = Paths.get("src/test/resources/TestGrokParser/simple_text.log");
    +
    +
    +    @Before
    +    public void init() {
    +        testRunner = TestRunners.newTestRunner(GrokParser.class);
    +    }
    +
    +    @Test
    +    public void testGrokParserWithMatchedContent() throws IOException {
    +
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{COMMONAPACHELOG}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/patterns");
    +        testRunner.enqueue(GROK_LOG_INPUT);
    +        testRunner.run();
    +        testRunner.assertAllFlowFilesTransferred(GrokParser.REL_MATCH);
    +        final MockFlowFile matched = testRunner.getFlowFilesForRelationship(GrokParser.REL_MATCH).get(0);
    +
    +        matched.assertAttributeEquals("verb","GET");
    +        matched.assertAttributeEquals("response","401");
    +        matched.assertAttributeEquals("bytes","12846");
    +        matched.assertAttributeEquals("clientip","64.242.88.10");
    +        matched.assertAttributeEquals("auth","-");
    +        matched.assertAttributeEquals("timestamp","07/Mar/2004:16:05:49 -0800");
    +        matched.assertAttributeEquals("request","/twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables");
    +        matched.assertAttributeEquals("httpversion","1.1");
    +
    +    }
    +
    +    @Test
    +    public void testGrokParserWithUnMatchedContent() throws IOException {
    +
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{ADDRESS}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/patterns");
    +        testRunner.enqueue(GROK_TEXT_INPUT);
    +        testRunner.run();
    +        testRunner.assertAllFlowFilesTransferred(GrokParser.REL_NO_MATCH);
    +        final MockFlowFile notMatched = testRunner.getFlowFilesForRelationship(GrokParser.REL_NO_MATCH).get(0);
    +        notMatched.assertContentEquals(GROK_TEXT_INPUT);
    +
    +    }
    +
    +    @Test(expected = java.lang.AssertionError.class)
    --- End diff --
    
    Rather than expected an AssertionError, we should avoid calling testRunner.run() and instead just use testRunner.assertNotValid()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83229023
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    +            .addValidator(StandardValidators.FILE_EXISTS_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor DESTINATION = new PropertyDescriptor.Builder()
    +            .name("Destination")
    +            .description("Control if Grok output value is written as a new flowfile attribute  " +
    +                    "or written in the flowfile content. Writing to flowfile content will overwrite any " +
    +                    "existing flowfile content.")
    +            .required(true)
    +            .allowableValues(DESTINATION_ATTRIBUTE, DESTINATION_CONTENT)
    +            .defaultValue(DESTINATION_ATTRIBUTE)
    +            .build();
    +
    +    public static final PropertyDescriptor CHARACTER_SET = new PropertyDescriptor
    +            .Builder().name("Character Set")
    +            .description("The Character Set in which the file is encoded")
    +            .required(true)
    +            .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
    +            .defaultValue("UTF-8")
    +            .build();
    +
    +    public static final PropertyDescriptor MAX_BUFFER_SIZE = new PropertyDescriptor
    +            .Builder().name("Maximum Buffer Size")
    +            .description("Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions. Files larger than the specified maximum will not be fully evaluated.")
    +            .required(true)
    +            .addValidator(StandardValidators.DATA_SIZE_VALIDATOR)
    +            .addValidator(StandardValidators.createDataSizeBoundsValidator(0, Integer.MAX_VALUE))
    +            .defaultValue("1 MB")
    +            .build();
    +
    +    public static final Relationship REL_MATCH = new Relationship.Builder()
    +            .name("matched")
    +            .description("FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result")
    +            .build();
    +
    +    public static final Relationship REL_NO_MATCH = new Relationship.Builder()
    +            .name("unmatched")
    +            .description("FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile")
    +            .build();
    +
    +    private List<PropertyDescriptor> descriptors;
    +
    +    private Set<Relationship> relationships;
    +
    +    private static final ObjectMapper objectMapper = new ObjectMapper();
    +
    +    private Grok grok;
    +    private byte[] buffer;
    +
    +
    +    @Override
    +    protected void init(final ProcessorInitializationContext context) {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<PropertyDescriptor>();
    +        descriptors.add(GROK_EXPRESSION);
    +        descriptors.add(GROK_PATTERN_FILE);
    +        descriptors.add(DESTINATION);
    +        descriptors.add(CHARACTER_SET);
    +        descriptors.add(MAX_BUFFER_SIZE);
    +        this.descriptors = Collections.unmodifiableList(descriptors);
    +
    +        final Set<Relationship> relationships = new HashSet<Relationship>();
    +        relationships.add(REL_MATCH);
    +        relationships.add(REL_NO_MATCH);
    +        this.relationships = Collections.unmodifiableSet(relationships);
    +    }
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return this.relationships;
    +    }
    +
    +    @Override
    +    public final List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return descriptors;
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +
    +        final int maxBufferSize = context.getProperty(MAX_BUFFER_SIZE).asDataSize(DataUnit.B).intValue();
    +        buffer = new byte[maxBufferSize];
    +
    +        try{
    +            grok = Grok.create(context.getProperty(GROK_PATTERN_FILE).getValue());
    +            grok.compile(context.getProperty(GROK_EXPRESSION).getValue());
    +        }catch (GrokException e){
    +            getLogger().error("Failed to initialize ExtractGrok due to: ", e);
    +        }
    +
    +
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +        FlowFile flowFile = session.get();
    +        if ( flowFile == null ) {
    +            return;
    +        }
    +
    +        final Charset charset = Charset.forName(context.getProperty(CHARACTER_SET).getValue());
    +        final Map<String, String> grokResults = new HashMap<>();
    +        final byte[] byteBuffer = buffer;
    +        session.read(flowFile, new InputStreamCallback() {
    +            @Override
    +            public void process(InputStream in) throws IOException {
    +                StreamUtils.fillBuffer(in, byteBuffer, false);
    +            }
    +        });
    +        final long len = Math.min(byteBuffer.length, flowFile.getSize());
    +        final String contentString = new String(byteBuffer, 0, (int) len, charset);
    +
    +
    +        final Match gm = grok.match(contentString);
    +        gm.captures();
    +        for(Map.Entry<String,Object> entry: gm.toMap().entrySet()){
    +            if(null != entry.getValue() ) {
    +                grokResults.put(entry.getKey(), entry.getValue().toString());
    +            }
    +        }
    +
    +        if (grokResults.isEmpty()) {
    +            session.transfer(flowFile, REL_NO_MATCH);
    +            getLogger().info("Did not match any Grok Expressions for FlowFile {}", new Object[]{flowFile});
    +            return ;
    +        }
    +        switch (context.getProperty(DESTINATION).getValue()){
    +            case DESTINATION_ATTRIBUTE:
    +
    +
    +                flowFile = session.putAllAttributes(flowFile, grokResults);
    +
    +                session.getProvenanceReporter().modifyAttributes(flowFile);
    +                session.transfer(flowFile, REL_MATCH);
    +                getLogger().info("Matched {} Grok Expressions and added attributes to FlowFile {}", new Object[]{grokResults.size(), flowFile});
    +
    +                break;
    +            case DESTINATION_CONTENT:
    +
    +                FlowFile conFlowfile = session.write(flowFile, new StreamCallback() {
    +                    @Override
    +                    public void process(InputStream in, OutputStream out) throws IOException {
    +                        try (OutputStream outputStream = new BufferedOutputStream(out)) {
    +                            outputStream.write(objectMapper.writeValueAsBytes(grokResults));
    +                        }
    +                    }
    +                });
    +                conFlowfile = session.putAttribute(conFlowfile, CoreAttributes.MIME_TYPE.key(), APPLICATION_JSON);
    +                session.getProvenanceReporter().modifyContent(conFlowfile,"Replaced content with parsed Grok fields and values");
    --- End diff --
    
    I would recommend again that we include the amount of time that it took to perform the update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @trixpan  could you please tell me how to rebase a PR ? I rebased my [branch](https://github.com/selim-namsi/nifi/commits/nifi-2565) but I didn't find how to rebase the PR.
    sorry for the inconvenience , this is my first PR.
    
    Cheers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r84481939
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    --- End diff --
    
    To this point, similar nomenclature has been used in other places:
    
    https://github.com/DhruvKumar/nifi-grok-processor-bundle/tree/master/nifi-grok-processors/src/main/java/dhruv/nifi/processors


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83227323
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    +            .addValidator(StandardValidators.FILE_EXISTS_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor DESTINATION = new PropertyDescriptor.Builder()
    +            .name("Destination")
    +            .description("Control if Grok output value is written as a new flowfile attribute  " +
    --- End diff --
    
    If "flowfile-attribute" is used, which attribute will the info be written to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    Please rebase this PR so it only include your changes. 
    
    Cheers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r84483537
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    --- End diff --
    
    So the pattern of naming is 'Verb Subject'.  It appears the point of this processor, from a users point of view (not the developers), is to evaluate Grok expressions against flow file content to replace that content with the result or to update a flow file attribute with that result.  If that is the case we could take the approach of 'EvaluateGrok' or 'GrokEvaluateText' or 'ExtractGrok' is also fair game I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/1108


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @trixpan I figured out how to rebase the pull request :)
    
    Cheers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83227422
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    --- End diff --
    
    Should probably use "FlowFile Attribute" and "FlowFile Content" for these, so that it is more intuitive for users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @trixpan @joewitt Sorry for the long delay, I applied the changes that you suggested such as, adding the custom validator in the processor class, use the new version of grok and removing the license section from the assembly license


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83228785
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    +            .addValidator(StandardValidators.FILE_EXISTS_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor DESTINATION = new PropertyDescriptor.Builder()
    +            .name("Destination")
    +            .description("Control if Grok output value is written as a new flowfile attribute  " +
    +                    "or written in the flowfile content. Writing to flowfile content will overwrite any " +
    +                    "existing flowfile content.")
    +            .required(true)
    +            .allowableValues(DESTINATION_ATTRIBUTE, DESTINATION_CONTENT)
    +            .defaultValue(DESTINATION_ATTRIBUTE)
    +            .build();
    +
    +    public static final PropertyDescriptor CHARACTER_SET = new PropertyDescriptor
    +            .Builder().name("Character Set")
    +            .description("The Character Set in which the file is encoded")
    +            .required(true)
    +            .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
    +            .defaultValue("UTF-8")
    +            .build();
    +
    +    public static final PropertyDescriptor MAX_BUFFER_SIZE = new PropertyDescriptor
    +            .Builder().name("Maximum Buffer Size")
    +            .description("Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions. Files larger than the specified maximum will not be fully evaluated.")
    +            .required(true)
    +            .addValidator(StandardValidators.DATA_SIZE_VALIDATOR)
    +            .addValidator(StandardValidators.createDataSizeBoundsValidator(0, Integer.MAX_VALUE))
    +            .defaultValue("1 MB")
    +            .build();
    +
    +    public static final Relationship REL_MATCH = new Relationship.Builder()
    +            .name("matched")
    +            .description("FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result")
    +            .build();
    +
    +    public static final Relationship REL_NO_MATCH = new Relationship.Builder()
    +            .name("unmatched")
    +            .description("FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile")
    +            .build();
    +
    +    private List<PropertyDescriptor> descriptors;
    +
    +    private Set<Relationship> relationships;
    +
    +    private static final ObjectMapper objectMapper = new ObjectMapper();
    +
    +    private Grok grok;
    +    private byte[] buffer;
    +
    +
    +    @Override
    +    protected void init(final ProcessorInitializationContext context) {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<PropertyDescriptor>();
    +        descriptors.add(GROK_EXPRESSION);
    +        descriptors.add(GROK_PATTERN_FILE);
    +        descriptors.add(DESTINATION);
    +        descriptors.add(CHARACTER_SET);
    +        descriptors.add(MAX_BUFFER_SIZE);
    +        this.descriptors = Collections.unmodifiableList(descriptors);
    +
    +        final Set<Relationship> relationships = new HashSet<Relationship>();
    +        relationships.add(REL_MATCH);
    +        relationships.add(REL_NO_MATCH);
    +        this.relationships = Collections.unmodifiableSet(relationships);
    +    }
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return this.relationships;
    +    }
    +
    +    @Override
    +    public final List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return descriptors;
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +
    +        final int maxBufferSize = context.getProperty(MAX_BUFFER_SIZE).asDataSize(DataUnit.B).intValue();
    +        buffer = new byte[maxBufferSize];
    +
    +        try{
    +            grok = Grok.create(context.getProperty(GROK_PATTERN_FILE).getValue());
    +            grok.compile(context.getProperty(GROK_EXPRESSION).getValue());
    +        }catch (GrokException e){
    +            getLogger().error("Failed to initialize ExtractGrok due to: ", e);
    +        }
    +
    +
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +        FlowFile flowFile = session.get();
    +        if ( flowFile == null ) {
    +            return;
    +        }
    +
    +        final Charset charset = Charset.forName(context.getProperty(CHARACTER_SET).getValue());
    +        final Map<String, String> grokResults = new HashMap<>();
    +        final byte[] byteBuffer = buffer;
    +        session.read(flowFile, new InputStreamCallback() {
    +            @Override
    +            public void process(InputStream in) throws IOException {
    +                StreamUtils.fillBuffer(in, byteBuffer, false);
    +            }
    +        });
    +        final long len = Math.min(byteBuffer.length, flowFile.getSize());
    +        final String contentString = new String(byteBuffer, 0, (int) len, charset);
    +
    +
    +        final Match gm = grok.match(contentString);
    +        gm.captures();
    +        for(Map.Entry<String,Object> entry: gm.toMap().entrySet()){
    +            if(null != entry.getValue() ) {
    +                grokResults.put(entry.getKey(), entry.getValue().toString());
    +            }
    +        }
    +
    +        if (grokResults.isEmpty()) {
    +            session.transfer(flowFile, REL_NO_MATCH);
    +            getLogger().info("Did not match any Grok Expressions for FlowFile {}", new Object[]{flowFile});
    +            return ;
    +        }
    +        switch (context.getProperty(DESTINATION).getValue()){
    +            case DESTINATION_ATTRIBUTE:
    +
    +
    +                flowFile = session.putAllAttributes(flowFile, grokResults);
    +
    +                session.getProvenanceReporter().modifyAttributes(flowFile);
    --- End diff --
    
    It's probably worth reporting this along with the amount of time that it took to update the attributes. This can be very nice information to have if troubleshooting a flow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r84476594
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    --- End diff --
    
    @markap14 - "flowfile-attribute", "flowfile-content" is an established pattern:
    
    https://github.com/apache/nifi/search?utf8=%E2%9C%93&q=flowfile-attribute
    
    I am happy to address the terminology on other processors but I suggest we use the same naming, unless planning to change across the other processors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @selim-namsi Thanks for contributing this! I have actually been very interested in using NiFi to do some log parsing but hadn't really dug in very much to understand the best way to go about it. This looks like it could be very powerful!
    
    Before we get this merged into the codebase, though, it looks like there is some work that needs to be done to the PR. The concern stems, I think, from you not yet being overly familiar with the API, as there are empty @ReadsAttributes, @WritesAttributes annotations, etc. But the great news is that the NiFi community tends to be very inclusive and will help to get everything in great shape!
    
    One thing that I did notice is that you updated the Licensing information, which is one of the most commonly overlooked issues. So very glad that's there. I'll leave some inline feedback on things that I notice, but very much looking forward to this getting in!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83229754
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestGrokParser/apache.log ---
    @@ -0,0 +1 @@
    +64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
    --- End diff --
    
    We have to ensure that we have proper licensing for these test files. This one may be one that you created yourself? If not, we need to ensure that its license is properly accounted for - or just mock out a new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi closed the pull request at:

    https://github.com/apache/nifi/pull/1108


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    By the way, most of these modifications are already available here: 
    
    https://github.com/trixpan/nifi/commits/NIFI-2565


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r84476015
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    --- End diff --
    
    @markap14 - this is not a parser but an extractor (Grok is a hyper regex) so I suggest the name to be ExtractGrok (after ExtractText)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83229088
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java ---
    @@ -0,0 +1,104 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +
    +import org.apache.nifi.util.MockFlowFile;
    +import org.apache.nifi.util.TestRunner;
    +import org.apache.nifi.util.TestRunners;
    +import org.junit.Before;
    +import org.junit.Test;
    +
    +import java.io.IOException;
    +import java.nio.file.Path;
    +import java.nio.file.Paths;
    +
    +/**
    + * Created by snamsi on 05/10/16.
    --- End diff --
    
    We should not have usernames here, as Git will provide this information for us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83226835
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    --- End diff --
    
    We should consider several more tags: grok, log, text, parse, delimit, extract


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83226928
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    --- End diff --
    
    No need for the @SeeAlso, @ReadsAtributes, and @WritesAttributes annotations if they are not being used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83228132
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    +            .addValidator(StandardValidators.FILE_EXISTS_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor DESTINATION = new PropertyDescriptor.Builder()
    +            .name("Destination")
    +            .description("Control if Grok output value is written as a new flowfile attribute  " +
    +                    "or written in the flowfile content. Writing to flowfile content will overwrite any " +
    +                    "existing flowfile content.")
    +            .required(true)
    +            .allowableValues(DESTINATION_ATTRIBUTE, DESTINATION_CONTENT)
    +            .defaultValue(DESTINATION_ATTRIBUTE)
    +            .build();
    +
    +    public static final PropertyDescriptor CHARACTER_SET = new PropertyDescriptor
    +            .Builder().name("Character Set")
    +            .description("The Character Set in which the file is encoded")
    +            .required(true)
    +            .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
    +            .defaultValue("UTF-8")
    +            .build();
    +
    +    public static final PropertyDescriptor MAX_BUFFER_SIZE = new PropertyDescriptor
    +            .Builder().name("Maximum Buffer Size")
    +            .description("Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions. Files larger than the specified maximum will not be fully evaluated.")
    +            .required(true)
    +            .addValidator(StandardValidators.DATA_SIZE_VALIDATOR)
    +            .addValidator(StandardValidators.createDataSizeBoundsValidator(0, Integer.MAX_VALUE))
    +            .defaultValue("1 MB")
    +            .build();
    +
    +    public static final Relationship REL_MATCH = new Relationship.Builder()
    +            .name("matched")
    +            .description("FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result")
    +            .build();
    +
    +    public static final Relationship REL_NO_MATCH = new Relationship.Builder()
    +            .name("unmatched")
    +            .description("FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile")
    +            .build();
    +
    +    private List<PropertyDescriptor> descriptors;
    +
    +    private Set<Relationship> relationships;
    +
    +    private static final ObjectMapper objectMapper = new ObjectMapper();
    +
    +    private Grok grok;
    +    private byte[] buffer;
    +
    +
    +    @Override
    +    protected void init(final ProcessorInitializationContext context) {
    +        final List<PropertyDescriptor> descriptors = new ArrayList<PropertyDescriptor>();
    +        descriptors.add(GROK_EXPRESSION);
    +        descriptors.add(GROK_PATTERN_FILE);
    +        descriptors.add(DESTINATION);
    +        descriptors.add(CHARACTER_SET);
    +        descriptors.add(MAX_BUFFER_SIZE);
    +        this.descriptors = Collections.unmodifiableList(descriptors);
    +
    +        final Set<Relationship> relationships = new HashSet<Relationship>();
    +        relationships.add(REL_MATCH);
    +        relationships.add(REL_NO_MATCH);
    +        this.relationships = Collections.unmodifiableSet(relationships);
    +    }
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return this.relationships;
    +    }
    +
    +    @Override
    +    public final List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return descriptors;
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +
    +        final int maxBufferSize = context.getProperty(MAX_BUFFER_SIZE).asDataSize(DataUnit.B).intValue();
    +        buffer = new byte[maxBufferSize];
    +
    +        try{
    +            grok = Grok.create(context.getProperty(GROK_PATTERN_FILE).getValue());
    +            grok.compile(context.getProperty(GROK_EXPRESSION).getValue());
    +        }catch (GrokException e){
    +            getLogger().error("Failed to initialize ExtractGrok due to: ", e);
    --- End diff --
    
    If this is the case, the Processor will not behave properly. In fact, if the Grok.create() throws an Exception, the previous value will still be stored in the 'grok' member variable, which means that if the user changes the value, and we can't create a Grok for the new value, the processor will continue to use the old value. Instead, we should avoid catching this Exception and just declare the method to throw GrokException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r87978657
  
    --- Diff: nifi-commons/nifi-processor-utilities/pom.xml ---
    @@ -45,5 +45,16 @@
                 <artifactId>nifi-ssl-context-service-api</artifactId>
                 <scope>provided</scope>
             </dependency>
    +        <dependency>
    +            <groupId>io.thekraken</groupId>
    +            <artifactId>grok</artifactId>
    +            <version>0.1.4</version>
    --- End diff --
    
    A new version has been released today and contains important fixes (reduced depencies, better feature parity with logstash, etc). May I  suggest we upgrade the  dependency? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @joewitt when you have time can you have a final look on this one? LGTM and I am happy to address the two remaining cosmetic comments as part of a separate PR.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @selim-namsi thanks for putting together. I was coding this processor but happy to review it.
    
    Few comments:
    
    1. Your code is failing -Pcontrib-check. Can you please fix this?
    
    2. Ideally I believe this processor should allow use to chose between content replacement (replace the original log line with the json representation) and adding attributes (what you already did). This should give an idea of what I mean: https://github.com/apache/nifi/pull/785
    
    3. Please don't hard-code the patterns, instead, let the user configure the pattern files
    
    4. The java-grok version has some bugs, you may want to upgrade it (this is the reason I haven't submitted the code previously... :smiley: )
    
    Thank you again, looking forward your modifications


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @trixpan Thank you for all this useful feedback, I'll start working on these modifications.
    For the patterns, I hard coded the patterns because I was thinking about adding by default some useful patterns and also let the user add his custom pattern. What do you think about it ?
    
    Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83720540
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    +            .addValidator(StandardValidators.FILE_EXISTS_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor DESTINATION = new PropertyDescriptor.Builder()
    +            .name("Destination")
    +            .description("Control if Grok output value is written as a new flowfile attribute  " +
    --- End diff --
    
    @markap14  Actually what I meant is that the output will contain many new flowfile attributes : the attribute names will be the grok identifiers and the attributes values will be the matched values. If you are okay with this I'll update the description


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r96236556
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractGrok.java ---
    @@ -0,0 +1,298 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import io.thekraken.grok.api.Grok;
    +import io.thekraken.grok.api.Match;
    +import io.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.annotation.lifecycle.OnStopped;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.components.ValidationContext;
    +import org.apache.nifi.components.ValidationResult;
    +import org.apache.nifi.components.Validator;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import org.apache.nifi.util.StopWatch;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.concurrent.BlockingQueue;
    +import java.util.concurrent.LinkedBlockingQueue;
    +import java.util.concurrent.TimeUnit;
    +
    +
    +@Tags({"Grok Processor", "grok", "log", "text", "parse", "delimit", "extract"})
    +@CapabilityDescription("Evaluates one or more Grok Expressions against the content of a FlowFile, " +
    +        "adding the results as attributes or replacing the content of the FlowFile with a JSON " +
    +        "notation of the matched content")
    +@WritesAttributes({
    +        @WritesAttribute(attribute = "grok.XXX", description = "Each of the Grok identifier that is matched in the flowfile will be added as an attribute, prefixed with \"grok.\" For example," +
    --- End diff --
    
    Isn't this just applicable if using flowfile-attribute as destination?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by selim-namsi <gi...@git.apache.org>.
Github user selim-namsi commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83729426
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java ---
    @@ -0,0 +1,104 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +
    +import org.apache.nifi.util.MockFlowFile;
    +import org.apache.nifi.util.TestRunner;
    +import org.apache.nifi.util.TestRunners;
    +import org.junit.Before;
    +import org.junit.Test;
    +
    +import java.io.IOException;
    +import java.nio.file.Path;
    +import java.nio.file.Paths;
    +
    +/**
    + * Created by snamsi on 05/10/16.
    + */
    +public class TestGrokParser {
    +
    +    private TestRunner testRunner;
    +    final static Path GROK_LOG_INPUT = Paths.get("src/test/resources/TestGrokParser/apache.log");
    +    final static Path GROK_TEXT_INPUT = Paths.get("src/test/resources/TestGrokParser/simple_text.log");
    +
    +
    +    @Before
    +    public void init() {
    +        testRunner = TestRunners.newTestRunner(GrokParser.class);
    +    }
    +
    +    @Test
    +    public void testGrokParserWithMatchedContent() throws IOException {
    +
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{COMMONAPACHELOG}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/patterns");
    +        testRunner.enqueue(GROK_LOG_INPUT);
    +        testRunner.run();
    +        testRunner.assertAllFlowFilesTransferred(GrokParser.REL_MATCH);
    +        final MockFlowFile matched = testRunner.getFlowFilesForRelationship(GrokParser.REL_MATCH).get(0);
    +
    +        matched.assertAttributeEquals("verb","GET");
    +        matched.assertAttributeEquals("response","401");
    +        matched.assertAttributeEquals("bytes","12846");
    +        matched.assertAttributeEquals("clientip","64.242.88.10");
    +        matched.assertAttributeEquals("auth","-");
    +        matched.assertAttributeEquals("timestamp","07/Mar/2004:16:05:49 -0800");
    +        matched.assertAttributeEquals("request","/twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables");
    +        matched.assertAttributeEquals("httpversion","1.1");
    +
    +    }
    +
    +    @Test
    +    public void testGrokParserWithUnMatchedContent() throws IOException {
    +
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{ADDRESS}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/patterns");
    +        testRunner.enqueue(GROK_TEXT_INPUT);
    +        testRunner.run();
    +        testRunner.assertAllFlowFilesTransferred(GrokParser.REL_NO_MATCH);
    +        final MockFlowFile notMatched = testRunner.getFlowFilesForRelationship(GrokParser.REL_NO_MATCH).get(0);
    +        notMatched.assertContentEquals(GROK_TEXT_INPUT);
    +
    +    }
    +
    +    @Test(expected = java.lang.AssertionError.class)
    +    public void testGrokParserWithNotFoundPatternFile() throws IOException {
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{COMMONAPACHELOG}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/toto_file");
    +        testRunner.enqueue(GROK_LOG_INPUT);
    +        testRunner.run();
    +
    +    }
    +
    +
    +    @Test(expected = java.lang.AssertionError.class)
    --- End diff --
    
     For this method "testGrokParserWithBadGrokExpression", although the processor is throwing GrokException, when I use assertNotValid, the test fails with the following message "java.lang.AssertionError: Processor appears to be valid but expected it to be invalid"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83227771
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor GROK_PATTERN_FILE = new PropertyDescriptor
    +            .Builder().name("Grok Pattern file")
    +            .description("Grok Pattern file definition")
    +            .required(false)
    +            .addValidator(StandardValidators.FILE_EXISTS_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor DESTINATION = new PropertyDescriptor.Builder()
    +            .name("Destination")
    +            .description("Control if Grok output value is written as a new flowfile attribute  " +
    +                    "or written in the flowfile content. Writing to flowfile content will overwrite any " +
    +                    "existing flowfile content.")
    +            .required(true)
    +            .allowableValues(DESTINATION_ATTRIBUTE, DESTINATION_CONTENT)
    +            .defaultValue(DESTINATION_ATTRIBUTE)
    +            .build();
    +
    +    public static final PropertyDescriptor CHARACTER_SET = new PropertyDescriptor
    +            .Builder().name("Character Set")
    +            .description("The Character Set in which the file is encoded")
    +            .required(true)
    +            .addValidator(StandardValidators.CHARACTER_SET_VALIDATOR)
    +            .defaultValue("UTF-8")
    +            .build();
    +
    +    public static final PropertyDescriptor MAX_BUFFER_SIZE = new PropertyDescriptor
    +            .Builder().name("Maximum Buffer Size")
    +            .description("Specifies the maximum amount of data to buffer (per file) in order to apply the Grok expressions. Files larger than the specified maximum will not be fully evaluated.")
    +            .required(true)
    +            .addValidator(StandardValidators.DATA_SIZE_VALIDATOR)
    +            .addValidator(StandardValidators.createDataSizeBoundsValidator(0, Integer.MAX_VALUE))
    +            .defaultValue("1 MB")
    +            .build();
    +
    +    public static final Relationship REL_MATCH = new Relationship.Builder()
    +            .name("matched")
    +            .description("FlowFiles are routed to this relationship when the Grok Expression is successfully evaluated and the FlowFile is modified as a result")
    +            .build();
    +
    +    public static final Relationship REL_NO_MATCH = new Relationship.Builder()
    +            .name("unmatched")
    +            .description("FlowFiles are routed to this relationship when no provided Grok Expression matches the content of the FlowFile")
    +            .build();
    +
    +    private List<PropertyDescriptor> descriptors;
    +
    +    private Set<Relationship> relationships;
    +
    +    private static final ObjectMapper objectMapper = new ObjectMapper();
    +
    +    private Grok grok;
    --- End diff --
    
    These member variables are not thread-safe. They should be defined inline where needed, or properly protected via Java's concurrency mechanisms.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83229352
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestGrokParser.java ---
    @@ -0,0 +1,104 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +
    +import org.apache.nifi.util.MockFlowFile;
    +import org.apache.nifi.util.TestRunner;
    +import org.apache.nifi.util.TestRunners;
    +import org.junit.Before;
    +import org.junit.Test;
    +
    +import java.io.IOException;
    +import java.nio.file.Path;
    +import java.nio.file.Paths;
    +
    +/**
    + * Created by snamsi on 05/10/16.
    + */
    +public class TestGrokParser {
    +
    +    private TestRunner testRunner;
    +    final static Path GROK_LOG_INPUT = Paths.get("src/test/resources/TestGrokParser/apache.log");
    +    final static Path GROK_TEXT_INPUT = Paths.get("src/test/resources/TestGrokParser/simple_text.log");
    +
    +
    +    @Before
    +    public void init() {
    +        testRunner = TestRunners.newTestRunner(GrokParser.class);
    +    }
    +
    +    @Test
    +    public void testGrokParserWithMatchedContent() throws IOException {
    +
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{COMMONAPACHELOG}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/patterns");
    +        testRunner.enqueue(GROK_LOG_INPUT);
    +        testRunner.run();
    +        testRunner.assertAllFlowFilesTransferred(GrokParser.REL_MATCH);
    +        final MockFlowFile matched = testRunner.getFlowFilesForRelationship(GrokParser.REL_MATCH).get(0);
    +
    +        matched.assertAttributeEquals("verb","GET");
    +        matched.assertAttributeEquals("response","401");
    +        matched.assertAttributeEquals("bytes","12846");
    +        matched.assertAttributeEquals("clientip","64.242.88.10");
    +        matched.assertAttributeEquals("auth","-");
    +        matched.assertAttributeEquals("timestamp","07/Mar/2004:16:05:49 -0800");
    +        matched.assertAttributeEquals("request","/twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables");
    +        matched.assertAttributeEquals("httpversion","1.1");
    +
    +    }
    +
    +    @Test
    +    public void testGrokParserWithUnMatchedContent() throws IOException {
    +
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{ADDRESS}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/patterns");
    +        testRunner.enqueue(GROK_TEXT_INPUT);
    +        testRunner.run();
    +        testRunner.assertAllFlowFilesTransferred(GrokParser.REL_NO_MATCH);
    +        final MockFlowFile notMatched = testRunner.getFlowFilesForRelationship(GrokParser.REL_NO_MATCH).get(0);
    +        notMatched.assertContentEquals(GROK_TEXT_INPUT);
    +
    +    }
    +
    +    @Test(expected = java.lang.AssertionError.class)
    +    public void testGrokParserWithNotFoundPatternFile() throws IOException {
    +
    +        testRunner.setProperty(GrokParser.GROK_EXPRESSION, "%{COMMONAPACHELOG}");
    +        testRunner.setProperty(GrokParser.GROK_PATTERN_FILE, "src/test/resources/TestGrokParser/toto_file");
    +        testRunner.enqueue(GROK_LOG_INPUT);
    +        testRunner.run();
    +
    +    }
    +
    +
    +    @Test(expected = java.lang.AssertionError.class)
    --- End diff --
    
    Rather than expected an AssertionError, we should avoid calling testRunner.run() and instead just use testRunner.assertNotValid()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by joewitt <gi...@git.apache.org>.
Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r87937612
  
    --- Diff: nifi-assembly/LICENSE ---
    @@ -1729,4 +1729,20 @@ This product bundles 'jbzip2' which is available under an MIT license.
         AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
         LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
         OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
    -    THE SOFTWARE.
    \ No newline at end of file
    +    THE SOFTWARE.
    --- End diff --
    
    This whole license section can be removed.  This is the assembly license which is to cover all binary artifacts and source in the build of nifi itself.  The dependency of java-grok is binary only (not source) and is ASLv2 so nothing needs to be in this license for it.  There should be an entry for this in the notice similar to the many ASLv2 examples in there.   The only thing needing mentioned then is the copyright line from the project's license file https://github.com/thekrakken/java-grok/blob/master/LICENSE.  Also, this nifi-asembly/NOTICE change needed will also need to be in the NOTICE of the nifi-standard-nar as well.
    
    Lots of words above but the short version is "No license change needed.  Just add a small section to the nar NOTICE and assembly NOTICE to reflect this ASLv2 dependency specifically because it has a copyright reference in the license."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by trixpan <gi...@git.apache.org>.
Github user trixpan commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    @selim-namsi - good work. We are getting there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1108: NIFI-2565: add Grok parser

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/1108
  
    Looks like there are new merge issues, do you mind rebasing against the latest master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83226436
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    --- End diff --
    
    We should probably expand on this a bit more. Many users will not know what Grok is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1108: NIFI-2565: add Grok parser

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1108#discussion_r83228313
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GrokParser.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import com.fasterxml.jackson.databind.ObjectMapper;
    +import oi.thekraken.grok.api.Grok;
    +import oi.thekraken.grok.api.Match;
    +import oi.thekraken.grok.api.exception.GrokException;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.attributes.CoreAttributes;
    +import org.apache.nifi.flowfile.FlowFile;
    +
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessorInitializationContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.io.InputStreamCallback;
    +import org.apache.nifi.processor.io.StreamCallback;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.stream.io.BufferedOutputStream;
    +import org.apache.nifi.stream.io.StreamUtils;
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.io.OutputStream;
    +import java.nio.charset.Charset;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.HashMap;
    +import java.util.Set;
    +import java.util.HashSet;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +
    +
    +@Tags({"Grok Processor"})
    +@CapabilityDescription("Use Grok expression ,a la logstash, to parse data.")
    +@SeeAlso({})
    +@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
    +@WritesAttributes({@WritesAttribute(attribute="", description="")})
    +public class GrokParser extends AbstractProcessor {
    +
    +
    +    public static final String DESTINATION_ATTRIBUTE = "flowfile-attribute";
    +    public static final String DESTINATION_CONTENT = "flowfile-content";
    +    private static final String APPLICATION_JSON = "application/json";
    +
    +    public static final PropertyDescriptor GROK_EXPRESSION = new PropertyDescriptor
    +            .Builder().name("Grok Expression")
    +            .description("Grok expression")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    --- End diff --
    
    We should probably use a custom validator to make sure that the configured value is valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---