You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Andre <an...@fucs.org> on 2016/08/04 14:18:39 UTC

Re: Suggestion of processors

All,

Apologies for the delay but ParseCEF is pending review on PR785 / NIFI-2341.

The processor is fairly simple but this hides the underlying library
(ParCEFone) that was written to perform the parsing [1].

While developing the processor I noticed the absence of Open Source CEF
parsing & validation routines. Given that Metron, Morphlines, Hive and a
number of Apache projects lack CEF support (or SerDes), it came to me it
could be a good idea to spin off the parser so it can be used by the above
projects.

Given the complexity of the CEF standard, additional real world testing of
the parsing logic (with or without the Processor) is welcome.


[1] - https://github.com/fluenda/ParCEFone

On Thu, Jun 23, 2016 at 8:30 AM, Andre <an...@fucs.org> wrote:

> Ryan,
>
> Grok is not a replacement for automatic kv and cef parsing.
>
> Instead grok targets to simplify the use of complex regexes (underneath
> the surface most of grok is regex).
>
> To be honest I considered creating a ParseGrok processor but as a light
> user of grok I left it for a second stage.
>
> Cheers
> On 23 Jun 2016 06:58, "Ryan Ward" <ry...@gmail.com> wrote:
>
>> Aldrin,
>>
>> Would adding support for grok be more advantageous then specific
>> processors
>> to convert a particular format. Grok is very popular for such use cases.
>>
>> Ryan
>>
>> On Wed, Jun 22, 2016 at 4:17 PM, Aldrin Piri <al...@gmail.com>
>> wrote:
>>
>> > Andre,
>> >
>> > Thanks for the insights and context, the information regarding DNS was
>> > certainly helpful and I think that seems like a pretty good breakdown of
>> > functionality and unique purpose.  All seem like they would provide some
>> > neat applications for enrichment.
>> >
>> > ParseKV definitely seems to fulfill a certain need but am curious as to
>> how
>> > we might make it provide a bit more coverage of similar formats.  With
>> > configuration that allows specification of both the key value separators
>> > (in the example you provided a space, or new line) and the delimiter of
>> the
>> > pairings, this would possibly provide support for types of files as
>> well,
>> > such as Properties/ini file formats.  I do find myself uncertain of how
>> > much it would apply to the latter cases though.  I can see how the
>> format
>> > would map pretty nicely to columnar type stores.
>> >
>> > Might you be able to expand on how this information would typically be
>> > handled downstream?
>> >
>> > On Wed, Jun 22, 2016 at 10:35 AM, Andre <an...@fucs.org> wrote:
>> >
>> > > Aldrin,
>> > >
>> > > On Wed, Jun 22, 2016 at 12:24 AM, Aldrin Piri <al...@gmail.com>
>> > > wrote:
>> > >
>> > > > Concerning the ParseKV, are you aware of the getDelimitedField[1]
>> > > function
>> > > > in Expression Language?  I think this may take care of this case for
>> > > > handling these items.
>> > > >
>> > >
>> > > I am aware of getDelimitedField but I found a few cases where using
>> > becomes
>> > > a bit challenging:
>> > >
>> > > * Multiple instances of the same key and poorly defined format(note
>> how
>> > > just one field uses quotes):
>> > >
>> > > from=bob@acme.com to=alice@acme.com to=eve@acme.com subject="I had
>> > enough
>> > > of this"
>> > >
>> > > * Variable set of keys (tag wasn't present, now it is):
>> > >
>> > > from=bob@acme.com to=alice@acme.com to=eve@acme.com to=jay@acme.com
>> > > tag=important tag=vip tag=tag1 ... tag=tag55 subject=I had enough of
>> this
>> > >
>> > > If you think if reasonably doable I am happy to reconsider.
>> > >
>> > >
>> > >
>> > > For the security folks like me, QueryBulkWhois and QueryDNS are very
>> > > different beasts:
>> > >
>> > > * QueryDNS does what a normal DNS resolver does, but because of the
>> > parsing
>> > > mechanism it can be used to handle responses in a smart way. As such
>> one
>> > > can use QueryDNS to use DNS based API (ShadowServer, Cymru, Cisco
>> > > SenderBase [1]), RBLs (Spamhaus, etc).
>> > >
>> > > * Enters QueryBulkWhois: batching optimises queries by allowing a
>> large
>> > > number of subjects to be submitted using a single request.
>> > >
>> > > Yes, it may BulkWhois may be offered by providers that may also
>> provide
>> > API
>> > > but these are note restricted to overlapping offerings, however
>> projects
>> > > like "Prefix WhoIs Project" only offer Whois with no DNS API
>> available at
>> > > all.
>> > >
>> > >
>> > > [1]
>> > >
>> > >
>> > http://stackoverflow.com/questions/14145886/how-to-
>> programmatically-query-senderbase-org
>> > >
>> > >
>> > > > With the QueryBulkWhois API, does it make sense to roll this into
>> the
>> > > > QueryDNS as a configurable property to do batch?  Performing a
>> cursory
>> > > > review of the PR, it looks like this would potentially be targeting
>> > those
>> > > > same servers.  Are batch lookups to more web service oriented
>> endpoints
>> > > as
>> > > > opposed to just querying DNS?
>> > > >
>> > > > --aldrin
>> > > >
>> > > >
>> > > >
>> > >
>> >
>>
>