You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/17 14:25:01 UTC

[jira] [Commented] (NIFI-9203) Improve GrokReader to be able to handle complex Grok expressions

    [ https://issues.apache.org/jira/browse/NIFI-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416726#comment-17416726 ] 

ASF subversion and git services commented on NIFI-9203:
-------------------------------------------------------

Commit 229f45997d7e4e07e7ad95196b2ee1fd0a886365 in nifi's branch refs/heads/main from Tamas Palfy
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=229f459 ]

NIFI-9203 Improve GrokReader to be able to handle complex grok expression properly.

This closes #5376.

Signed-off-by: Peter Turcsanyi <tu...@apache.org>


> Improve GrokReader to be able to handle complex Grok expressions
> ----------------------------------------------------------------
>
>                 Key: NIFI-9203
>                 URL: https://issues.apache.org/jira/browse/NIFI-9203
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Tamas Palfy
>            Assignee: Tamas Palfy
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current {{GrokReader}} implementation cannot handle complex expressions like in the following scenario:
> Suppose we have a custom Grok pattern file:
> {code}
> SYSLOGBASE_ISO8601 %{TIMESTAMP_ISO8601:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
> LINE_1 %{SYSLOGBASE}%{GREEDYDATA:message}
> LINE_2 %{SYSLOGBASE_ISO8601}%{GREEDYDATA:message}
> LINE (?:%{LINE_1}|%{LINE_2})
> {code}
> If we set the Grok expression to:
> {code:}
> %LINE
> {code}
> the service will fail for 2 reasons:
> # LINE_1 and LINE_2 define the same labels. The service will try to create a schema by adding fields for all labels encountered. This leads to duplicate fields in the schema which is not allowed.
> # When the used Grok library reads a record based on a complex expression it returns an array as a value as the complex expression can have multiple matches. NiFi in turn tries to handle it as a byte array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)