You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Otto Fowler (JIRA)" <ji...@apache.org> on 2018/06/28 17:57:00 UTC

[jira] [Updated] (NIFI-5324) Implement syslog record readers

     [ https://issues.apache.org/jira/browse/NIFI-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otto Fowler updated NIFI-5324:
------------------------------
    Status: Patch Available  (was: In Progress)

> Implement syslog record readers
> -------------------------------
>
>                 Key: NIFI-5324
>                 URL: https://issues.apache.org/jira/browse/NIFI-5324
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Bende
>            Assignee: Otto Fowler
>            Priority: Major
>
> Creating this Jira based on discussion with [~ottobackwards] in the NiFi HipChat room...
> We currently have ListenSyslog with optional parsing when batch size is 1, and ParseSyslog which also assumes 1 message per flow file. There is also ListenTCPRecord and ListenUDPRecord which can be used with a GrokReader to read log messages from the respective network connections.
> The common scenario for wanting to parse the syslog messages is to extract a field from the syslog message into an attribute and then use the attribute to make decisions like routing/filtering.
> Since the "1 message per flow file" pattern is generally something we try to avoid, it would be nice if we could keep batches of syslog messages together in a single flow file and then use record processors to process the batches.
> For example, if we had a syslog record reader we could then use PartitionRecord to divide a flow file of many syslog records into smaller groups based on some field in the message, each group can then be routed somewhere based on the group value.
> Another example would be to use QueryRecord to run a SQL query that selects specify syslog messages based on a field in the message.
> It would also make it easy to convert syslog messages to a structured format using ConvertRecord with a syslog reader and a writer like JSON or Avro.
> We would likely want two syslog record readers, one for each of the RFC formats.
> One aspect to consider is related to the schema used/produced by the reader... typically the readers/writers have a "Schema Access Strategy" where they can obtain a schema from a schema registry, or from flow file attributes, or something specific to the format like an embedded Avro schema.
> In this case, the schema is somewhat pre-determined by the specific syslog reader because the schema can only be at-most the fields produced by the reader parsing the messages. So this may be a case where there is no schema access strategy, and there are per-determined schemas.  It is sort of like the GrokReader where it creates a schema from the named fields in the expression, except in this case there is no user defined expression, and the named fields are dictated by the parser.
> We may need to reuse syslog related code that is in nifi-standard-processors, so it might require moving that code to nifi-processor-utils, or creating a new nifi-syslog-utils module.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)