You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Tamas Palfy (Jira)" <ji...@apache.org> on 2021/09/08 12:42:00 UTC

[jira] [Created] (NIFI-9203) Improve GrokReader to be able to handle complex Grok expressions

Tamas Palfy created NIFI-9203:
---------------------------------

             Summary: Improve GrokReader to be able to handle complex Grok expressions
                 Key: NIFI-9203
                 URL: https://issues.apache.org/jira/browse/NIFI-9203
             Project: Apache NiFi
          Issue Type: Bug
            Reporter: Tamas Palfy


The current {{GrokReader}} implementation cannot handle complex expressions like in the following scenario:

Suppose we have a custom Grok pattern file:
{code}
SYSLOGBASE_ISO8601 %{TIMESTAMP_ISO8601:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
LINE_1 %{SYSLOGBASE}%{GREEDYDATA:message}
LINE_2 %{SYSLOGBASE_ISO8601}%{GREEDYDATA:message}
LINE (?:%{LINE_1}|%{LINE_2})
{code}

If we set the Grok expression to:
{code:}
%LINE
{code}
the service will fail for 2 reasons:
# LINE_1 and LINE_2 define the same labels. The service will try to create a schema by adding fields for all labels encountered. This leads to duplicate fields in the schema which is not allowed.
# When the used Grok library reads a record based on a complex expression it returns an array as a value as the complex expression can have multiple matches. NiFi in turn tries to handle it as a byte array.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)