You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metron.apache.org by Justin Leet <ju...@gmail.com> on 2018/08/07 13:26:45 UTC

[DISCUSS] Metron Parsers in Nifi

Hi all,

There's interest in being able to run Metron parsers in NiFi, rather than
inside Storm. I dug into this a bit, and have some thoughts on how we could
go about this. I'd love feedback on this, along with anything we'd
consider must haves as well as future enhancements.

1. Separate metron-parsers into metron-parsers-common and metron-storm
and create metron-parsers-nifi. For this code to be reusable across
platforms (NiFi, Storm, and anything else in the future), we'll need to
decouple our parsers and Storm.
- There's also some nice fringe benefits around refactoring our code
to be substantially more clear and understandable; something
which came up
while allowing for parser aggregation.
2. Create a MetronProcessor that can run our parsers.
- I took a look at how RecordReader could be leveraged (e.g.
CSVRecordReader), but this is pretty tightly tied into schemas
and is meant
to be used by ControllerServices, which are then used by Processors.
There's friction involved there in terms of schemas, but also in terms of
access to ZK configs and things like parser chaining. We might
be able to
leverage it, but it seems like it'd be fairly shoehorned in
without getting
the schema and other benefits.
- This Processor would work similarly to Storm: bytes[] in -> JSON
out.
- There is a Processor
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java>
that
handles loading other JARs that we can model a
MetronParserProcessor off of
that handles classpath/classloader issues (basically just sets up a
classloader specific to what's being loaded and swaps out the Thread's
loader when it calls to outside resources).
3. Create a MetronZkControllerService to supply our configs to our
processors.
- This is a pretty established NiFi pattern for being able to provide
access to other services needed by a Processor (e.g. databases or large
configurations files).
- The same controller service can be used by all Processors to manage
configs in a consistent manner.

At that point, we can just NAR our controller service and parser processor
up as needed, deploy them to NiFi, and let the user provide a config for
where their custom parsers can be provided (i.e. their parser jar). This
would be 3 nars (processor, controller-service, and controller-service-api
in order to bind the other two together).

Once deployed, our ability to use parsers should fit well into the standard
NiFi workflow:

1. Create a MetronZkControllerService.
2. Configure the service to point at zookeeper.
3. Create a MetronParser.
4. Configure it to use the controller service + parser jar location +
any other needed configs.
5. Use the outputs as needed downstream (either writing out to Kafka or
feeding into more MetronParsers, etc.)

Chaining parsers should ideally become a matter of chaining MetronParsers
(and making sure the enveloping configs carry through properly). For parser
aggregation, I'd just avoid it entirely until we know it's needed in NiFi.

Justin

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

On August 15, 2018 at 09:30:47, Justin Leet (justinjleet@gmail.com) wrote:

As an exercise, let me summarize the points of contention I've seen and lay
out the tradeoffs as I see them. That way we can prioritize what's
important to us in a NiFi implementation and better work towards a
favorable solution (basically, I want to requirements we have for an MVP).
My opinions/comments/questions are in *bold*.  Feel free (and encouraged to
disagree). Keep in mind, this stuff probably exists on a spectrum (we might
want to pick and choose what we do, and possibly even when we do it).

   - Splitting off fieldTransformations from the parser itself
   -
      - In Nifi, we're chaining processors to do our fieldTransformations.
      This can't be particularly automatic from a definition, to the best of my
      knowledge.
      - Our configuration between NiFi and Storm differs (because NiFi is
      building Processors and Storm is just acting on the transforms).
      - *I'm mostly fine with splitting these, IMO we just need to make
      sure it's documented. The current colocation of them feels
slightly sketchy
      to me in general (it feels like it's merging pure parsing and something
      more enrichment oriented).** I also like the idea of exposing Stellar
      transformations as their own Processor.*
      -
         - *Could anyone refresh my memory on why fieldTransformations are
         bundled with parsing directly?*

Obviously I am for this.  It makes for more flexible composition, the
ability to use stellar in nifi in general etc etc


   -
   NiFi parser configuration
   -
      - We can do it from ZK, but need to make it available in a manner not
      available in ZK
      - If we don't allow ZK, we can potentially have different sources of
      configs.
      -
         - *I personally don't like this very much. I always hated having
         to hop between things in order to manage these sort of things, but I
         consider that more annoying than blocking.*

If you take out the transformations, there is very little to configure for
most of the parsers.   From a NIFI point of view, having to go into metron
to configure a sensor in order to get it into NiFi makes no sense ( unless
I am not understanding what you are saying ). Expecting them to go back and
forth to understand the configurations doesn’t make much sense either.
Having a dual approach ( a sensor configuration controller service if you
want ZK already configured sensors and nifi property only ) would be the
best.

This is why I was asking about the target users.  If someone is going to
use NIFI _INSTEAD_ of metron for the parsers, then why have the
configurations in zookeeper?   What would the case for a hybrid NIFI +
STORM parsers mixed?  Is that likely?



   -
   Specific parsers vs an aggregated parser
   -
      - If they're all specific, it means every user who wants to implement
      a parser (even an existing one) in NiFi, they have to do
additional work to
      make it work in NiFi.
      -
         - *I don't think it's a lot of work on a per parser basis, and we
         might be able to ease this with some clever handling of our
interfaces.
         However, I personally don't like that there's no way to just run an
         existing Metron parser in NiFi without additional
NiFi-specific work.  To
         be clear, I'd prefer to have a quick way for users to take parsers,
         including preexisting parsers, into NiFi.  I don't think this
should be the
         end solution for most parsers, but it does feel like the
minimal viable
         product solution to cover current users mostly as-is.  In my
mind, users
         should be able to take preexisting Storm parsers and be able
to run them up
         and test them in NiFi with minimal involvement, even if the
end state is to
         do a more NiFi-like implementation.*
         - *If we expand this to other platforms (e.g. Spark?), do we
         expect everything to be reimplemented every time?  Or are we
making that
         decision on a platform-by-platform basis?*
         - *I think most parsers, including our own should be optimized as
         needed for NiFi, including whatever Schema work and
versioning we want to
         do, but I don't think that needs to be done right away.
Looking through
         source, our parsers are:*
         -
            - *Asa*
            - *Bro*
            - *CEF*
            - *CSV*
            - *Fireeye*
            - *ISE*
            - *JSON*
            - *Lancope*
            - *Logstash*
            - *Palo Alto*
            - *Snort*
            - *Sourcefire*
         - *I don't know that I want to go through and convert and test all
         of them in the first pass.*

We can have one nar, with a reader per processor for deployment ( similar
to having the one parser jar now ), like the standard services nar now.

The re-implementation is the glue code that

- loads, feeds, and outputs the result of a given parser, along with
documentation and schema exposure.  These things would always be different
per ‘host’ platform.  Where you expecting not to have different code for
spark, storm and nifi?

If you want a “Generic reader” that uses the class path and makes the user
specify the schema and can load any processor from a jar, well I think that
we just don’t agree or have the same idea or preference for where the
complexity is.


   -
   Processor vs. RecordReader
   -
      - RecordReader is the NiFi hotness.  Sounds like the interface
      actually is stable, which was really my primary concern with it
(Thanks for
      following up Otto!).
      -
         - *RecordReaders seem like they have positive performance
         implications to them, which I'm definitely in favor of.  The Processor
         approach would work, but given the rates of flow we see, it'd
be extremely
         nice to get the RecordReader benefits.  The schema benefits in
         RecordReaders are more clear if we split fieldTransformations
from parsing
         in NiFi, but that split might be more work (although result in a
         potentially much cleaner implementation of RecordReaders).
This would mean
         we have to do at least some upgrading for every parser we
want to be able
         to run in NiFi.*
         - *How much schema versioning do we need to support as part of a
         first cut? How much of this needs to be managed by NiFi
specific features?*
      - *I'm curious on people's thoughts on if we can do some unification
      on some of our parsers against RecordReader as Simon mentioned.  If we do
      that, do we then need to start wrapping NARs around everything as part of
      our build process to be able to use this in NiFi?  Does that break Storm
      deployment at all (for either our bundled parsers or for
existing 3rd party
      jars)? Will this affect us down the line if we decide to build out other
      use cases?*

I think I have a good understand on how to bundle our things in Nars at
build time.

I am not sure about the unification.  If you mean that the same schemas are
used for both, we can adopt Avro schemas ourselves, or adopt the ‘record
schema’ or think of that later.

I don’t know what you mean about breaking storm deployment.


   -
   Parser schema
   -
      - Should our parsers be able to define a schema (at least in the case
      of pure parsing)?  What is the overlap and set of concerns here?
      - What do we need here in terms of versioning? After all these things
      changes based on version.
      - What do we need for providing schemas for things like CSV or Grok
      or other data-based schemas?

Grok and CSV from nifi shouldn’t need to come over.  Nifi already has
these.  If we feel they are deficient, we should extend the Nifi capability.


*The summary of my view on this is basically "Ideally, I'd like a way to
get parsers working in a general case scenario in a relatively minimal way,
with the option to implement our parsers as needed with RecordReaders
(which offers several benefits, particularly for the pure parsing case)".
I think there's a lot of value for a minimal effort approach in getting a
general (if suboptimal) approach that works for everything existing.  If we
were to do that, I'd definitely still like to see at least the 2-3 of the
main in use parsers have NiFi oriented implementations (along with
supporting documentation recommending similar implementations / conversion
for existing parsers). At that point, I think my preferred approach would
be to have a general purpose Processor available (which I don't think is
much more work than the split itself), while providing a template and
examples for new parsers going forward.*

I don’t agree about the minimal approach.  I think a minimal approach of
the RecordReader per parser, with maybe a subset of parsers to start is
fine.  I do not think the initial should be the generic reader. My
reasoning is simple.  If the generic reader meets HW’s mvp, then it is
unlikely that you all will get scheduled to do more work, or will review
the work if someone else like me does it.  So it won’t get done.  I  would
be willing to just  go along with something generic and do the other stuff
myself as a follow on if I thought it would get reviewed, or if I thought
promises of reviews would be kept, but once bitten...




On Mon, Aug 13, 2018 at 9:42 AM Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Yep, I'm wondering whether our parser interface should have the ability to
> create schema either like that, or well, that, which would be helpful
> within Metron as well.
>
> @Otto, the one thing missing from the record reader api, is that if you
> don't emit any records at all for a flow file, it errors, which is not
> strictly speaking an error, but yeah, we can certainly control things like
> filtering errors aside from this. I would say this was a nifi bug
> (debatably) which should be fixed on that side.
>
> Simon
>
> On 13 August 2018 at 14:29, Otto Fowler <ot...@gmail.com> wrote:
>
>> Also,  If we are doing the record readers, we can have a reader for a
>> parser type and explicitly set the schema, as seen here :
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/syslog/Syslog5424Reader.java
>>
>>
>>
>> On August 13, 2018 at 09:26:50, Otto Fowler (ottobackwards@gmail.com)
>> wrote:
>>
>> If we can do the record readers ourselves ( with the parsers inside them
>> ) we can handle the returns.
>> I’ll be doing the net flow 5 readers once the net flow 5 processor PR (
>> not mine ) is in.
>>
>> I don’t think having a generic class loading parsers foo and having to
>> manage all that is preferable to having
>> an archetype and explicit parsers.
>>
>> Nifi processors and readers are self documenting, and this approach will
>> make that not possible, as another consideration.
>>
>>
>>
>> On August 13, 2018 at 06:50:09, Simon Elliston Ball (
>> simon@simonellistonball.com) wrote:
>>
>> Maybe the edge use case will clarify the config issue a little. The reason
>> I would want to be able to push Metron parsers into NiFi would be so I can
>> pre-parse and filter on the edge to save bandwidth from remote locations.
>> I
>> would expect to be able to parse at the edge and use NiFi to prioritise or
>> filter on the Metron ready data, then push through to a 'NoOp' parser in
>> Metron. For this to happen, we would absolutely not want to connect to
>> Zookeeper, so I'm +1 on Otto's suggestion that the config be embeddable in
>> NiFi properties. We cannot assume ZK connectivity from NiFi.
>>
>> I can also see a scenario where NiFi might make it easier to chain
>> parsers,
>> which is where it overlaps more with Metron. This is more about the fact
>> that NiFi make it a lot easier to configure and manage complex multi-step
>> flows than Metron, and is way more user intuitive from a design and
>> monitoring perspective. My main concern around using NiFi in this way is
>> about the load on the content repository. We are looking at a lot of
>> content level transformation here. You could argue that the same load is
>> taken off Kafka in the chaining scenario, but there is still a chance for
>> a
>> user to accidentally create a lot of disk access if they go over the top
>> with NiFi.
>>
>> I see this as potentially a a chance to make the Metron Parser interface
>> compatible with NiFi Record Readers. Then both communities could benefit
>> from sharing each other's parsers.
>>
>> In terms of the NAR approach, I would say we have a base bundle of the
>> NiFi
>> bits (https://github.com/simonellistonball/metron/tree/nifi already has
>> this for stellar, enrichments and an opinionated publisher, it also has a
>> readme with some discussion around this
>> https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle
>> ).
>> We can then use other nar dependencies to side load parser classes into
>> the
>> record reader. We would then need to do some fancy property validation in
>> NiFi to ensure the classes were available.
>>
>> Also, Record Readers are much much faster. The only problem I've found
>> with
>> them is that they error on blank output, which was a problem for me
>> writing
>> a netflow 9 reader (template only records need to live in NiFi cache, but
>> not be emitted).
>>
>> In terms of the schema objection, I'm not sure why schema focus is a
>> problem. Our parsers have implicit schema and the output schema formats
>> used in NiFi are very flexible and could be "just a map". That said, we
>> could also take the opportunity to introduce a method to the parser
>> interface to emit traits to contribute the bits of schema that a parser
>> produces. This would ultimately lead to us being able to generate output
>> schemas (ES, Solr, Hive, whatever which would take a lot of the pain out
>> of
>> setup for sensors).
>>
>> Simon
>>
>> On 9 August 2018 at 16:42, Otto Fowler <ot...@gmail.com> wrote:
>>
>> > I would say that
>> >
>> > - For each configuration parameter we want to pull in, it should be
>> > explicitly configured through a property as well as through a controller
>> > service that accesses the metron zk
>> > - Transformations should not be conflated with parsing in those
>> processors
>> > or readers
>> >
>> > There is no on the fly configuration change in nifi ( You can’t change
>> > properties once started ).
>> >
>> > Wouldn’t the simplest minimal start be to say that we expect either
>> nifi or
>> > metron and simplify things? Let nifi nifi, let metron metron.
>> >
>> >
>> > On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com)
>> wrote:
>> >
>> > That's definitely good info, thanks for reaching out to them about it.
>> >
>> > In terms of exposing/sharing, I don't think we have to couple them
>> tightly
>> > (in fact, I think we should loosen the coupling as much as possible
>> without
>> > forcing reimplementation of things). I think there's definitely a way
>> to do
>> > that terms of the general purpose processor I proposed (or in terms of
>> > RecordReader or another implementation).
>> >
>> > It would definitely be easy enough to configure it to either pull from
>> ZK
>> > or to use a parser config json extract as a parameter (to maintain the
>> same
>> > formatting and make migration easy). And we can still build specific
>> > NiFi-oriented parsers as needed (that manage things like Schema via the
>> > registry and other Nifi mechanisms). This keeps parsers entirely
>> decoupled
>> > from a metron installation.
>> >
>> > Alternatively, we extract our config handling to a module and scripts we
>> > can package up and easily deploy configs against ZK (or the maybe Nifi's
>> > StateController's or whatever). We definitely shouldn't need absolutely
>> > everything installed to be able to run just parsers on Nifi.
>> >
>> > Having said that, right now the easiest way we have to maintain on the
>> fly
>> > updatable configs (and updatable is important!) is via ZK. Params in
>> Nifi
>> > aren't quite that flexible, to the best of my knowledge (i.e. you have
>> to
>> > stop, update config and restart). We might be able to exploit the
>> > StateController to manage this for us, but I'm honestly not familiar
>> enough
>> > with it and for deployments split between NiFi and Storm, it means
>> > configuration gets managed in a couple different ways (which may with
>> users
>> > since there is a fairly brightline delineation which makes it easier to
>> > accept). There some complicated configs like fieldTransforms, which is
>> > part of why I would like things to be configured in the same format (if
>> not
>> > the same mechanism).
>> >
>> > Ideally, in my mind, the parsers shared between both NiFi and Storm just
>> > implement the very general MessageParser interface (which is pretty
>> > minimal, a couple setup methods, validation, and the actual parse). This
>> > is pretty lightweight and the split of metron-parsers into
>> > metron-parsers-common et al. would loosen the coupling between parsers
>> and
>> > the rest of metron into that core needed to support that.
>> >
>> > IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
>> > config management) that lets us run our set of parsers, lets users build
>> > new parsers (and don't block specialized NiFi implementations that
>> exploit
>> > NiFi's feature set), and lets us get things configured in a relatively
>> > consistent manner, without losing features, and hopefully requiring a
>> > pretty minimal slice of Metron to be useful.
>> >
>> > On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com>
>> > wrote:
>> >
>> > > I think the benefits are clear. What is unclear is if the goal is to
>> > > expose or share or re-use Metron capabilities ( stellar, parsing ) in
>> > nifi
>> > > in a way that is native to nifi ( configured and managed in nifi ),
>> where
>> > > you may not even need metron ( say you just want to parse asa ) or if
>> the
>> > > goal is to have a hybrid approach coupling the processors/readers to
>> the
>> > > metron installation.
>> > >
>> > >
>> > > On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com)
>> > wrote:
>> > >
>> > > I'll add onto Mike's discussion with the original set of requirements
>> I
>> > had
>> > > in mind (and apply feedback on these as necessary!). This is largely
>> > > overlap with what Mike said, but I want to make sure it's clear where
>> my
>> > > proposal was coming from, so we can improve on it as needed. James and
>> > > Mike are also right, I think I skipped over the benefits of NiFi in
>> > general
>> > > a bit, so thanks for chiming in there.
>> > >
>> > > - Deploy our bundled parsers without needing custom wrapping on all of
>> > > them.
>> > > - Don't prevent ourselves from building custom wrapping as needed.
>> > > - Custom Java parsers with an easy way to hook in, similar to what we
>> > > already do in Storm.
>> > > - One stop (or at least one format) configuration, for the case when
>> > we're
>> > > doing some thing in NiFi (parsers) and some elsewhere (enrichment and
>> > > indexing). I don't think it'll always be "start in NiFi, end in
>> Storm",
>> > > especially as we build out Stellar capability, but I also don't want
>> > users
>> > > learning a different set of configs and config tools for every
>> platform
>> > we
>> > > run on.
>> > > - Ability to build out parsers and other systems fairly easily, e.g.
>> > Spark.
>> > > - Support our current use cases (in particular parser chaining as a
>> more
>> > > advanced use case).
>> > >
>> > > It really boils down to providing a relatively simple user path to be
>> > able
>> > > to migrate to NiFi as needed or desired as simply as possible in a
>> very
>> > > general way, while not preventing parser by parser enhancements.
>> > >
>> > > On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
>> > > michael.miklavcic@gmail.com> wrote:
>> > >
>> > > > I think it also provides customers greater control over their
>> > > architecture
>> > > > by giving them the flexibility to choose where/how to host their
>> > parsers.
>> > > >
>> > > > To Justin's point about the API, my biggest concern about the
>> > > RecordReader
>> > > > approach is that it is not stable. We already have a similar
>> problem in
>> > > > having the TransportClient in ElasticSearch - they are prone to
>> > changing
>> > > it
>> > > > in minor versions with the advent of their newer REST API, which is
>> > > > problematic for ensuring a stable installation.
>> > > >
>> > > > From my own perspective, our goal with NiFi, at least in part,
>> should
>> > be
>> > > > the ability to deploy our core parsing infrastructure, i.e.
>> > > >
>> > > > - pre-built parsers
>> > > > - custom java parsers
>> > > > - Stellar transforms
>> > > > - custom stellar transforms
>> > > >
>> > > > And have the ability to configure it similarly to how we configure
>> > > parsers
>> > > > within Storm. Consistent with our recent parser chaining and
>> > aggregation
>> > > > feature, users should be able to construct and deploy similar
>> > constructs
>> > > in
>> > > > NiFi. The core architectural shift would be that parser code should
>> be
>> > > > platform agnostic. We provide the plumbing in Storm, NiFi, and
>> <Spark
>> > > > Streaming?, other> and platform architects and devops teams can
>> choose
>> > > how
>> > > > and where to deploy.
>> > > >
>> > > > Best,
>> > > > Mike
>> > > >
>> > > >
>> > > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org>
>> > wrote:
>> > > >
>> > > > > Integration with NiFi would be useful for parsing low-volume
>> > > telemetries
>> > > > > at the edge. This is a much more resource friendly way to do it
>> than
>> > > > > setting up dedicated storm topologies. The integration would be
>> that
>> > > the
>> > > > > NiFi processor parses the data and pushes it straight into the
>> > > enrichment
>> > > > > topic, saving us the resources of having multiple parsers in storm
>> > > > >
>> > > > > Thanks,
>> > > > > James
>> > > > >
>> > > > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
>> > > > > > Why do we start over. We are going back and forth on
>> > implementation,
>> > > > and
>> > > > > I
>> > > > > > don’t think we have the same goals or concerns.
>> > > > > >
>> > > > > > What would be the requirements or goals of metron integration
>> with
>> > > > Nifi?
>> > > > > > How many levels or options for integration do we have?
>> > > > > > What are the approaches to choose from?
>> > > > > > Who are the target users?
>> > > > > >
>> > > > > > On August 7, 2018 at 12:24:56, Justin Leet (
>> justinjleet@gmail.com)
>> > > > > wrote:
>> > > > > >
>> > > > > > So how does the MetronRecordReader roll into everything? It
>> seems
>> > > like
>> > > > > it'd
>> > > > > > be more useful on the reader per format approach, but otherwise
>> it
>> > > > > doesn't
>> > > > > > really seem like we gain much, and it requires getting
>> everything
>> > > > linked
>> > > > > up
>> > > > > > properly to be used. Assuming we looked at doing it that way, is
>> > the
>> > > > idea
>> > > > > > that we'd setup a ControllerService with the MetronRecordReader
>> > and a
>> > > > > > MetronRecordWriter and then have the StellarTransformRecord
>> > processor
>> > > > > > configured with those ControllerServices? How do we manage the
>> > > > > > configurations of the everything that way? How does the
>> > > > ControllerService
>> > > > > > get configured with whatever parser(s) are needed in the flow?
>> > > > Basically,
>> > > > > > what's your vision for how everything would tie together?
>> > > > > >
>> > > > > > I also forgot to mention this in the original writeup, but
>> there's
>> > > > > another
>> > > > > > reason to avoid the RecordReader: It's not considered stable.
>> See
>> > > > > >
>> > > > >
>> > > >
>> > > https://github.com/apache/nifi/blob/master/nifi-commons/
>> > nifi-record/src/main/java/org/apache/nifi/serialization/
>> > RecordReader.java#L34
>> > > > > .
>> > > > > > That alone makes me super hesitant to use it, if it can shift
>> out
>> > > from
>> > > > > > under us in even in incremental version.
>> > > > > >
>> > > > > > I'm also unclear on why StellarTransformRecord processor matters
>> > for
>> > > > > either
>> > > > > > approach. With the Processor approach you could simply follow
>> it up
>> > > > with
>> > > > > > the Stellar processor, the same way you'd would in the
>> RecordReader
>> > > > > > approach. The Stellar processor should be a parallel
>> improvement,
>> > > not a
>> > > > > > conflicting one.
>> > > > > >
>> > > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <
>> > ottobackwards@gmail.com
>> > > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> A Metron Processor itself isn’t really necessary. A
>> > > > MetronRecordReader
>> > > > > (
>> > > > > >> either the megalithic or a reader per format ) would be a good
>> > > > > approach.
>> > > > > >> Then have StellarTransformRecord processor that can do Stellar
>> on
>> > > > _any_
>> > > > > >> record, regardless of source.
>> > > > > >>
>> > > > > >> On August 7, 2018 at 11:06:22, Justin Leet (
>> justinjleet@gmail.com
>> > )
>> > > > > wrote:
>> > > > > >>
>> > > > > >> Thanks for the comments, Otto, this is definitely great
>> feedback.
>> > > I'd
>> > > > > >> love to respond inline, but the email's already starting to
>> lose
>> > > it's
>> > > > > >> formatting, so I'll go with the classic "wall of text". Let me
>> > know
>> > > > if
>> > > > > I
>> > > > > >> didn't address everything.
>> > > > > >>
>> > > > > >> Loading modules (or jars or whatever) outside of our Processor
>> > gives
>> > > > us
>> > > > > >> the benefit of making it incredibly easy for a users to create
>> > their
>> > > > > own
>> > > > > >> parsers. I would definitely expect our own bundled parsers to
>> be
>> > > > > included
>> > > > > >> in our base NAR, but loading modules enables users to only
>> have to
>> > > > > learn
>> > > > > >> how Metron wants our stuff lined up and just plug it in. Having
>> > said
>> > > > > that,
>> > > > > >> I could see having a wrapper for our bundled parsers that
>> makes it
>> > > > > really
>> > > > > >> easy to just say you want an MetronAsaParser or
>> MetronBroParser,
>> > > etc.
>> > > > > That
>> > > > > >> would give us the best of both worlds, where it's easy to get
>> > setup
>> > > > our
>> > > > > >> bundled parsers and also trivial to pull in non-bundled
>> parsers.
>> > > What
>> > > > > >> doing this gives us is an easy way to support (hopefully) every
>> > > > parser
>> > > > > that
>> > > > > >> gets made, right out of the box, without us needing to build a
>> > > > > specialized
>> > > > > >> version of everything until we decide to and without users
>> having
>> > to
>> > > > > jump
>> > > > > >> through hoops.
>> > > > > >>
>> > > > > >> None of this prevents anyone from creating specialized parsers
>> > (for
>> > > > > perf
>> > > > > >> reasons, or to use the schema registries, or anything else).
>> It's
>> > > > > probably
>> > > > > >> worthwhile to package up some of built-in parsers and customize
>> > them
>> > > > > to use
>> > > > > >> more specialized feature appropriately as we see things get
>> used
>> > in
>> > > > the
>> > > > > >> wild. Like you said, we could likely provide Avro schemas for
>> some
>> > > of
>> > > > > this
>> > > > > >> and give users a more robust experience on what we choose to
>> > support
>> > > > > and
>> > > > > >> provide guidance for other things. I'm also worried that
>> building
>> > > > > >> specialized schemas becomes problematic for things like parser
>> > > > chaining
>> > > > > >> (where our routers wrap the underlying messages and add on
>> their
>> > own
>> > > > > info).
>> > > > > >> Going down that road potentially requires anything wrapped to
>> > have a
>> > > > > >> specialized schema for the wrapped version in addition to a
>> > vanilla
>> > > > > version
>> > > > > >> (although please correct me if I'm missing something there,
>> I'll
>> > > > openly
>> > > > > >> admit to some shakiness on how that would be handled).
>> > > > > >>
>> > > > > >> I also disagree that this is un-Nifi-like, although I'm
>> admittedly
>> > > > not
>> > > > > as
>> > > > > >> skilled there. The basis for doing this is directly inspired by
>> > the
>> > > > > >> JoltTransformer, which is extremely similar to the proposed
>> setup
>> > > for
>> > > > > our
>> > > > > >> parsers: Simply take a spec (in this case the configs,
>> including
>> > the
>> > > > > >> fieldTransformations), and delegate a mapping from bytes[] to
>> > JSON.
>> > > > The
>> > > > > >> Jolt library even has an Expression Language (check out
>> > > > > >>
>> > > > >
>> > > >
>> > > https://community.hortonworks.com/articles/105965/
>> > expression-language-with-jolt-in-apache-nifi.html
>> > > > > ),
>> > > > > >> so it's not a foreign concept. I believe Simon Ball has already
>> > done
>> > > > > some
>> > > > > >> experimenting around with getting Stellar running in NiFi, and
>> I'd
>> > > > > love to
>> > > > > >> see Stellar more readily available in NiFi in general.
>> > > > > >>
>> > > > > >> Re: the ControllerService, I see this as a way to maintain
>> > Metron's
>> > > > > use of
>> > > > > >> ZK as the source of config truth. Users could definitely be
>> using
>> > > > NiFi
>> > > > > and
>> > > > > >> Storm in tandem (parse in NiFi + enrich and index from Storm,
>> for
>> > > > > >> example). Using the ControllerService gives us a ZK instance as
>> > the
>> > > > > single
>> > > > > >> source of truth. That way we aren't forcing users to go to two
>> > > > > different
>> > > > > >> places to manage configs. This also lets us leverage our
>> existing
>> > > > > scripts
>> > > > > >> and our existing infrastructure around configs and their
>> > management
>> > > > and
>> > > > > >> validation very easily. It also gives users a way to port from
>> > NiFi
>> > > > to
>> > > > > >> Storm or vice-versa without having to migrate configs as well.
>> We
>> > > > could
>> > > > > >> also provide the option to configure the Processor itself with
>> the
>> > > > data
>> > > > > >> (just don't set up a controller service and provide the json or
>> > > > > whatever as
>> > > > > >> one of our properties).
>> > > > > >>
>> > > > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
>> > > ottobackwards@gmail.com
>> > > > >
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >>> I think this is a good idea. As I mentioned in the other
>> thread
>> > > I’ve
>> > > > > >>> been doing a lot of work on Nifi recently.
>> > > > > >>> I think the important thing is that what is done should be
>> done
>> > the
>> > > > > NiFi
>> > > > > >>> way, not bolting the Metron composition
>> > > > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
>> > > > > components
>> > > > > >>> should be single purpose and simple, allowing
>> > > > > >>> exceptional flexibility in composition.
>> > > > > >>>
>> > > > > >>> Comments inline.
>> > > > > >>>
>> > > > > >>> On August 7, 2018 at 09:27:01, Justin Leet (
>> > justinjleet@gmail.com)
>> > > > > wrote:
>> > > > > >>>
>> > > > > >>> Hi all,
>> > > > > >>>
>> > > > > >>> There's interest in being able to run Metron parsers in NiFi,
>> > > rather
>> > > > > than
>> > > > > >>>
>> > > > > >>> inside Storm. I dug into this a bit, and have some thoughts on
>> > how
>> > > > we
>> > > > > >>> could
>> > > > > >>> go about this. I'd love feedback on this, along with anything
>> > we'd
>> > > > > >>> consider must haves as well as future enhancements.
>> > > > > >>>
>> > > > > >>> 1. Separate metron-parsers into metron-parsers-common and
>> > > > metron-storm
>> > > > > >>> and create metron-parsers-nifi. For this code to be reusable
>> > across
>> > > > > >>> platforms (NiFi, Storm, and anything else in the future),
>> we'll
>> > > need
>> > > > > to
>> > > > > >>> decouple our parsers and Storm.
>> > > > > >>>
>> > > > > >>> +1. The “parsing code” should be a library that implements an
>> > > > > interface
>> > > > > >>> ( another library ).
>> > > > > >>>
>> > > > > >>> The Processors and the Storm things can share them.
>> > > > > >>>
>> > > > > >>> - There's also some nice fringe benefits around refactoring
>> our
>> > > code
>> > > > > >>> to be substantially more clear and understandable; something
>> > > > > >>> which came up
>> > > > > >>> while allowing for parser aggregation.
>> > > > > >>> 2. Create a MetronProcessor that can run our parsers.
>> > > > > >>> - I took a look at how RecordReader could be leveraged (e.g.
>> > > > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
>> > > > > >>> and is meant
>> > > > > >>> to be used by ControllerServices, which are then used by
>> > > Processors.
>> > > > > >>> There's friction involved there in terms of schemas, but also
>> in
>> > > > > terms of
>> > > > > >>>
>> > > > > >>> access to ZK configs and things like parser chaining. We might
>> > > > > >>> be able to
>> > > > > >>> leverage it, but it seems like it'd be fairly shoehorned in
>> > > > > >>> without getting
>> > > > > >>> the schema and other benefits.
>> > > > > >>>
>> > > > > >>> We won’t have to provide our ‘no schema processors’ ( grok,
>> csv,
>> > > > json
>> > > > > ).
>> > > > > >>>
>> > > > > >>> All the remaining processors DO have schemas that we know
>> about.
>> > We
>> > > > > can
>> > > > > >>> just provide the avro schemas the same way we provide the ES
>> > > > schemas.
>> > > > > >>>
>> > > > > >>> The “parsing” should not be conflated with the
>> transform/stellar
>> > in
>> > > > > >>> NiFi. We should make that separate. Running Stellar over
>> Records
>> > > > > would be
>> > > > > >>> the best thing.
>> > > > > >>>
>> > > > > >>> - This Processor would work similarly to Storm: bytes[] in ->
>> > JSON
>> > > > > >>> out.
>> > > > > >>> - There is a Processor
>> > > > > >>> <
>> > > > > >>>
>> > > > >
>> > > >
>> > > https://github.com/apache/nifi/blob/master/nifi-nar-
>> > bundles/nifi-standard-bundle/nifi-standard-processors/src/
>> > main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>> > > > > >>> >
>> > > > > >>> that
>> > > > > >>> handles loading other JARs that we can model a
>> > > > > >>> MetronParserProcessor off of
>> > > > > >>> that handles classpath/classloader issues (basically just sets
>> > up a
>> > > > > >>> classloader specific to what's being loaded and swaps out the
>> > > > Thread's
>> > > > > >>> loader when it calls to outside resources).
>> > > > > >>>
>> > > > > >>> There should be no reason to load modules outside the NAR.
>> Why do
>> > > > you
>> > > > > >>> expect to? If each Metron Processor equiv of a Metron Storm
>> > Parser
>> > > > is
>> > > > > just
>> > > > > >>> parsing to json it shouldn’t need much.And we could package
>> them
>> > in
>> > > > > the
>> > > > > >>> NAR. I would suggest we have a Processor per Parser to allow
>> for
>> > > > > >>> specialization. It should all be in the nar.
>> > > > > >>>
>> > > > > >>> The Stellar Processor, if you would support the works would
>> > > possibly
>> > > > > need
>> > > > > >>> this.
>> > > > > >>>
>> > > > > >>> 3. Create a MetronZkControllerService to supply our configs to
>> > our
>> > > > > >>> processors.
>> > > > > >>> - This is a pretty established NiFi pattern for being able to
>> > > > provide
>> > > > > >>> access to other services needed by a Processor (e.g.
>> databases or
>> > > > > large
>> > > > > >>> configurations files).
>> > > > > >>> - The same controller service can be used by all Processors to
>> > > > manage
>> > > > > >>> configs in a consistent manner.
>> > > > > >>>
>> > > > > >>> I think controller services would make sense where needed, I’m
>> > just
>> > > > > not
>> > > > > >>> sure what you imagine them being needed for?
>> > > > > >>>
>> > > > > >>> If the user has NiFi, and a Registry etc, are you saying you
>> > > imagine
>> > > > > them
>> > > > > >>> using Metron + ZK to manage configurations? Or to be using
>> BOTH
>> > > > storm
>> > > > > >>> processors and Nifi Processors?
>> > > > > >>>
>> > > > > >>> At that point, we can just NAR our controller service and
>> parser
>> > > > > processor
>> > > > > >>>
>> > > > > >>> up as needed, deploy them to NiFi, and let the user provide a
>> > > config
>> > > > > for
>> > > > > >>> where their custom parsers can be provided (i.e. their parser
>> > jar).
>> > > > > This
>> > > > > >>> would be 3 nars (processor, controller-service, and
>> > > > > controller-service-api
>> > > > > >>>
>> > > > > >>> in order to bind the other two together).
>> > > > > >>>
>> > > > > >>> Once deployed, our ability to use parsers should fit well into
>> > the
>> > > > > >>> standard
>> > > > > >>> NiFi workflow:
>> > > > > >>>
>> > > > > >>> 1. Create a MetronZkControllerService.
>> > > > > >>> 2. Configure the service to point at zookeeper.
>> > > > > >>> 3. Create a MetronParser.
>> > > > > >>> 4. Configure it to use the controller service + parser jar
>> > location
>> > > > +
>> > > > > >>> any other needed configs.
>> > > > > >>> 5. Use the outputs as needed downstream (either writing out to
>> > > Kafka
>> > > > > or
>> > > > > >>> feeding into more MetronParsers, etc.)
>> > > > > >>>
>> > > > > >>> Chaining parsers should ideally become a matter of chaining
>> > > > > MetronParsers
>> > > > > >>>
>> > > > > >>> (and making sure the enveloping configs carry through
>> properly).
>> > > For
>> > > > > >>> parser
>> > > > > >>> aggregation, I'd just avoid it entirely until we know it's
>> needed
>> > > in
>> > > > > NiFi.
>> > > > > >>>
>> > > > > >>> Justin
>> > > > >
>> > > > > -------------------
>> > > > > Thank you,
>> > > > >
>> > > > > James Sirota
>> > > > > PMC- Apache Metron
>> > > > > jsirota AT apache DOT org
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> --
>> simon elliston ball
>> @sireb
>>
>>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Justin Leet <ju...@gmail.com>.

As an exercise, let me summarize the points of contention I've seen and lay
out the tradeoffs as I see them. That way we can prioritize what's
important to us in a NiFi implementation and better work towards a
favorable solution (basically, I want to requirements we have for an MVP).
My opinions/comments/questions are in *bold*.  Feel free (and encouraged to
disagree). Keep in mind, this stuff probably exists on a spectrum (we might
want to pick and choose what we do, and possibly even when we do it).

   - Splitting off fieldTransformations from the parser itself
      - In Nifi, we're chaining processors to do our fieldTransformations.
      This can't be particularly automatic from a definition, to the best of my
      knowledge.
      - Our configuration between NiFi and Storm differs (because NiFi is
      building Processors and Storm is just acting on the transforms).
      - *I'm mostly fine with splitting these, IMO we just need to make
      sure it's documented. The current colocation of them feels
slightly sketchy
      to me in general (it feels like it's merging pure parsing and something
      more enrichment oriented).** I also like the idea of exposing Stellar
      transformations as their own Processor.*
         - *Could anyone refresh my memory on why fieldTransformations are
         bundled with parsing directly?*
      - NiFi parser configuration
   - We can do it from ZK, but need to make it available in a manner not
      available in ZK
      - If we don't allow ZK, we can potentially have different sources of
      configs.
         - *I personally don't like this very much. I always hated having
         to hop between things in order to manage these sort of things, but I
         consider that more annoying than blocking.*
      - Specific parsers vs an aggregated parser
      - If they're all specific, it means every user who wants to implement
      a parser (even an existing one) in NiFi, they have to do
additional work to
      make it work in NiFi.
         - *I don't think it's a lot of work on a per parser basis, and we
         might be able to ease this with some clever handling of our
interfaces.
         However, I personally don't like that there's no way to just run an
         existing Metron parser in NiFi without additional
NiFi-specific work.  To
         be clear, I'd prefer to have a quick way for users to take parsers,
         including preexisting parsers, into NiFi.  I don't think this
should be the
         end solution for most parsers, but it does feel like the
minimal viable
         product solution to cover current users mostly as-is.  In my
mind, users
         should be able to take preexisting Storm parsers and be able
to run them up
         and test them in NiFi with minimal involvement, even if the
end state is to
         do a more NiFi-like implementation.*
         - *If we expand this to other platforms (e.g. Spark?), do we
         expect everything to be reimplemented every time?  Or are we
making that
         decision on a platform-by-platform basis?*
         - *I think most parsers, including our own should be optimized as
         needed for NiFi, including whatever Schema work and
versioning we want to
         do, but I don't think that needs to be done right away.
Looking through
         source, our parsers are:*
            - *Asa*
            - *Bro*
            - *CEF*
            - *CSV*
            - *Fireeye*
            - *ISE*
            - *JSON*
            - *Lancope*
            - *Logstash*
            - *Palo Alto*
            - *Snort*
            - *Sourcefire*
         - *I don't know that I want to go through and convert and test all
         of them in the first pass.*
      - Processor vs. RecordReader
      - RecordReader is the NiFi hotness.  Sounds like the interface
      actually is stable, which was really my primary concern with it
(Thanks for
      following up Otto!).
         - *RecordReaders seem like they have positive performance
         implications to them, which I'm definitely in favor of.  The Processor
         approach would work, but given the rates of flow we see, it'd
be extremely
         nice to get the RecordReader benefits.  The schema benefits in
         RecordReaders are more clear if we split fieldTransformations
from parsing
         in NiFi, but that split might be more work (although result in a
         potentially much cleaner implementation of RecordReaders).
This would mean
         we have to do at least some upgrading for every parser we
want to be able
         to run in NiFi.*
         - *How much schema versioning do we need to support as part of a
         first cut? How much of this needs to be managed by NiFi
specific features?*
      - *I'm curious on people's thoughts on if we can do some unification
      on some of our parsers against RecordReader as Simon mentioned.  If we do
      that, do we then need to start wrapping NARs around everything as part of
      our build process to be able to use this in NiFi?  Does that break Storm
      deployment at all (for either our bundled parsers or for
existing 3rd party
      jars)? Will this affect us down the line if we decide to build out other
      use cases?*
   - Parser schema
      - Should our parsers be able to define a schema (at least in the case
      of pure parsing)?  What is the overlap and set of concerns here?
      - What do we need here in terms of versioning? After all these things
      changes based on version.
      - What do we need for providing schemas for things like CSV or Grok
      or other data-based schemas?

*The summary of my view on this is basically "Ideally, I'd like a way to
get parsers working in a general case scenario in a relatively minimal way,
with the option to implement our parsers as needed with RecordReaders
(which offers several benefits, particularly for the pure parsing case)".
I think there's a lot of value for a minimal effort approach in getting a
general (if suboptimal) approach that works for everything existing.  If we
were to do that, I'd definitely still like to see at least the 2-3 of the
main in use parsers have NiFi oriented implementations (along with
supporting documentation recommending similar implementations / conversion
for existing parsers). At that point, I think my preferred approach would
be to have a general purpose Processor available (which I don't think is
much more work than the split itself), while providing a template and
examples for new parsers going forward.*

On Mon, Aug 13, 2018 at 9:42 AM Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> Yep, I'm wondering whether our parser interface should have the ability to
> create schema either like that, or well, that, which would be helpful
> within Metron as well.
>
> @Otto, the one thing missing from the record reader api, is that if you
> don't emit any records at all for a flow file, it errors, which is not
> strictly speaking an error, but yeah, we can certainly control things like
> filtering errors aside from this. I would say this was a nifi bug
> (debatably) which should be fixed on that side.
>
> Simon
>
> On 13 August 2018 at 14:29, Otto Fowler <ot...@gmail.com> wrote:
>
>> Also,  If we are doing the record readers, we can have a reader for a
>> parser type and explicitly set the schema, as seen here :
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/syslog/Syslog5424Reader.java
>>
>>
>>
>> On August 13, 2018 at 09:26:50, Otto Fowler (ottobackwards@gmail.com)
>> wrote:
>>
>> If we can do the record readers ourselves ( with the parsers inside them
>> ) we can handle the returns.
>> I’ll be doing the net flow 5 readers once the net flow 5 processor PR (
>> not mine ) is in.
>>
>> I don’t think having a generic class loading parsers foo and having to
>> manage all that is preferable to having
>> an archetype and explicit parsers.
>>
>> Nifi processors and readers are self documenting, and this approach will
>> make that not possible, as another consideration.
>>
>>
>>
>> On August 13, 2018 at 06:50:09, Simon Elliston Ball (
>> simon@simonellistonball.com) wrote:
>>
>> Maybe the edge use case will clarify the config issue a little. The reason
>> I would want to be able to push Metron parsers into NiFi would be so I can
>> pre-parse and filter on the edge to save bandwidth from remote locations.
>> I
>> would expect to be able to parse at the edge and use NiFi to prioritise or
>> filter on the Metron ready data, then push through to a 'NoOp' parser in
>> Metron. For this to happen, we would absolutely not want to connect to
>> Zookeeper, so I'm +1 on Otto's suggestion that the config be embeddable in
>> NiFi properties. We cannot assume ZK connectivity from NiFi.
>>
>> I can also see a scenario where NiFi might make it easier to chain
>> parsers,
>> which is where it overlaps more with Metron. This is more about the fact
>> that NiFi make it a lot easier to configure and manage complex multi-step
>> flows than Metron, and is way more user intuitive from a design and
>> monitoring perspective. My main concern around using NiFi in this way is
>> about the load on the content repository. We are looking at a lot of
>> content level transformation here. You could argue that the same load is
>> taken off Kafka in the chaining scenario, but there is still a chance for
>> a
>> user to accidentally create a lot of disk access if they go over the top
>> with NiFi.
>>
>> I see this as potentially a a chance to make the Metron Parser interface
>> compatible with NiFi Record Readers. Then both communities could benefit
>> from sharing each other's parsers.
>>
>> In terms of the NAR approach, I would say we have a base bundle of the
>> NiFi
>> bits (https://github.com/simonellistonball/metron/tree/nifi already has
>> this for stellar, enrichments and an opinionated publisher, it also has a
>> readme with some discussion around this
>> https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle
>> ).
>> We can then use other nar dependencies to side load parser classes into
>> the
>> record reader. We would then need to do some fancy property validation in
>> NiFi to ensure the classes were available.
>>
>> Also, Record Readers are much much faster. The only problem I've found
>> with
>> them is that they error on blank output, which was a problem for me
>> writing
>> a netflow 9 reader (template only records need to live in NiFi cache, but
>> not be emitted).
>>
>> In terms of the schema objection, I'm not sure why schema focus is a
>> problem. Our parsers have implicit schema and the output schema formats
>> used in NiFi are very flexible and could be "just a map". That said, we
>> could also take the opportunity to introduce a method to the parser
>> interface to emit traits to contribute the bits of schema that a parser
>> produces. This would ultimately lead to us being able to generate output
>> schemas (ES, Solr, Hive, whatever which would take a lot of the pain out
>> of
>> setup for sensors).
>>
>> Simon
>>
>> On 9 August 2018 at 16:42, Otto Fowler <ot...@gmail.com> wrote:
>>
>> > I would say that
>> >
>> > - For each configuration parameter we want to pull in, it should be
>> > explicitly configured through a property as well as through a controller
>> > service that accesses the metron zk
>> > - Transformations should not be conflated with parsing in those
>> processors
>> > or readers
>> >
>> > There is no on the fly configuration change in nifi ( You can’t change
>> > properties once started ).
>> >
>> > Wouldn’t the simplest minimal start be to say that we expect either
>> nifi or
>> > metron and simplify things? Let nifi nifi, let metron metron.
>> >
>> >
>> > On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com)
>> wrote:
>> >
>> > That's definitely good info, thanks for reaching out to them about it.
>> >
>> > In terms of exposing/sharing, I don't think we have to couple them
>> tightly
>> > (in fact, I think we should loosen the coupling as much as possible
>> without
>> > forcing reimplementation of things). I think there's definitely a way
>> to do
>> > that terms of the general purpose processor I proposed (or in terms of
>> > RecordReader or another implementation).
>> >
>> > It would definitely be easy enough to configure it to either pull from
>> ZK
>> > or to use a parser config json extract as a parameter (to maintain the
>> same
>> > formatting and make migration easy). And we can still build specific
>> > NiFi-oriented parsers as needed (that manage things like Schema via the
>> > registry and other Nifi mechanisms). This keeps parsers entirely
>> decoupled
>> > from a metron installation.
>> >
>> > Alternatively, we extract our config handling to a module and scripts we
>> > can package up and easily deploy configs against ZK (or the maybe Nifi's
>> > StateController's or whatever). We definitely shouldn't need absolutely
>> > everything installed to be able to run just parsers on Nifi.
>> >
>> > Having said that, right now the easiest way we have to maintain on the
>> fly
>> > updatable configs (and updatable is important!) is via ZK. Params in
>> Nifi
>> > aren't quite that flexible, to the best of my knowledge (i.e. you have
>> to
>> > stop, update config and restart). We might be able to exploit the
>> > StateController to manage this for us, but I'm honestly not familiar
>> enough
>> > with it and for deployments split between NiFi and Storm, it means
>> > configuration gets managed in a couple different ways (which may with
>> users
>> > since there is a fairly brightline delineation which makes it easier to
>> > accept). There some complicated configs like fieldTransforms, which is
>> > part of why I would like things to be configured in the same format (if
>> not
>> > the same mechanism).
>> >
>> > Ideally, in my mind, the parsers shared between both NiFi and Storm just
>> > implement the very general MessageParser interface (which is pretty
>> > minimal, a couple setup methods, validation, and the actual parse). This
>> > is pretty lightweight and the split of metron-parsers into
>> > metron-parsers-common et al. would loosen the coupling between parsers
>> and
>> > the rest of metron into that core needed to support that.
>> >
>> > IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
>> > config management) that lets us run our set of parsers, lets users build
>> > new parsers (and don't block specialized NiFi implementations that
>> exploit
>> > NiFi's feature set), and lets us get things configured in a relatively
>> > consistent manner, without losing features, and hopefully requiring a
>> > pretty minimal slice of Metron to be useful.
>> >
>> > On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com>
>> > wrote:
>> >
>> > > I think the benefits are clear. What is unclear is if the goal is to
>> > > expose or share or re-use Metron capabilities ( stellar, parsing ) in
>> > nifi
>> > > in a way that is native to nifi ( configured and managed in nifi ),
>> where
>> > > you may not even need metron ( say you just want to parse asa ) or if
>> the
>> > > goal is to have a hybrid approach coupling the processors/readers to
>> the
>> > > metron installation.
>> > >
>> > >
>> > > On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com)
>> > wrote:
>> > >
>> > > I'll add onto Mike's discussion with the original set of requirements
>> I
>> > had
>> > > in mind (and apply feedback on these as necessary!). This is largely
>> > > overlap with what Mike said, but I want to make sure it's clear where
>> my
>> > > proposal was coming from, so we can improve on it as needed. James and
>> > > Mike are also right, I think I skipped over the benefits of NiFi in
>> > general
>> > > a bit, so thanks for chiming in there.
>> > >
>> > > - Deploy our bundled parsers without needing custom wrapping on all of
>> > > them.
>> > > - Don't prevent ourselves from building custom wrapping as needed.
>> > > - Custom Java parsers with an easy way to hook in, similar to what we
>> > > already do in Storm.
>> > > - One stop (or at least one format) configuration, for the case when
>> > we're
>> > > doing some thing in NiFi (parsers) and some elsewhere (enrichment and
>> > > indexing). I don't think it'll always be "start in NiFi, end in
>> Storm",
>> > > especially as we build out Stellar capability, but I also don't want
>> > users
>> > > learning a different set of configs and config tools for every
>> platform
>> > we
>> > > run on.
>> > > - Ability to build out parsers and other systems fairly easily, e.g.
>> > Spark.
>> > > - Support our current use cases (in particular parser chaining as a
>> more
>> > > advanced use case).
>> > >
>> > > It really boils down to providing a relatively simple user path to be
>> > able
>> > > to migrate to NiFi as needed or desired as simply as possible in a
>> very
>> > > general way, while not preventing parser by parser enhancements.
>> > >
>> > > On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
>> > > michael.miklavcic@gmail.com> wrote:
>> > >
>> > > > I think it also provides customers greater control over their
>> > > architecture
>> > > > by giving them the flexibility to choose where/how to host their
>> > parsers.
>> > > >
>> > > > To Justin's point about the API, my biggest concern about the
>> > > RecordReader
>> > > > approach is that it is not stable. We already have a similar
>> problem in
>> > > > having the TransportClient in ElasticSearch - they are prone to
>> > changing
>> > > it
>> > > > in minor versions with the advent of their newer REST API, which is
>> > > > problematic for ensuring a stable installation.
>> > > >
>> > > > From my own perspective, our goal with NiFi, at least in part,
>> should
>> > be
>> > > > the ability to deploy our core parsing infrastructure, i.e.
>> > > >
>> > > > - pre-built parsers
>> > > > - custom java parsers
>> > > > - Stellar transforms
>> > > > - custom stellar transforms
>> > > >
>> > > > And have the ability to configure it similarly to how we configure
>> > > parsers
>> > > > within Storm. Consistent with our recent parser chaining and
>> > aggregation
>> > > > feature, users should be able to construct and deploy similar
>> > constructs
>> > > in
>> > > > NiFi. The core architectural shift would be that parser code should
>> be
>> > > > platform agnostic. We provide the plumbing in Storm, NiFi, and
>> <Spark
>> > > > Streaming?, other> and platform architects and devops teams can
>> choose
>> > > how
>> > > > and where to deploy.
>> > > >
>> > > > Best,
>> > > > Mike
>> > > >
>> > > >
>> > > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org>
>> > wrote:
>> > > >
>> > > > > Integration with NiFi would be useful for parsing low-volume
>> > > telemetries
>> > > > > at the edge. This is a much more resource friendly way to do it
>> than
>> > > > > setting up dedicated storm topologies. The integration would be
>> that
>> > > the
>> > > > > NiFi processor parses the data and pushes it straight into the
>> > > enrichment
>> > > > > topic, saving us the resources of having multiple parsers in storm
>> > > > >
>> > > > > Thanks,
>> > > > > James
>> > > > >
>> > > > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
>> > > > > > Why do we start over. We are going back and forth on
>> > implementation,
>> > > > and
>> > > > > I
>> > > > > > don’t think we have the same goals or concerns.
>> > > > > >
>> > > > > > What would be the requirements or goals of metron integration
>> with
>> > > > Nifi?
>> > > > > > How many levels or options for integration do we have?
>> > > > > > What are the approaches to choose from?
>> > > > > > Who are the target users?
>> > > > > >
>> > > > > > On August 7, 2018 at 12:24:56, Justin Leet (
>> justinjleet@gmail.com)
>> > > > > wrote:
>> > > > > >
>> > > > > > So how does the MetronRecordReader roll into everything? It
>> seems
>> > > like
>> > > > > it'd
>> > > > > > be more useful on the reader per format approach, but otherwise
>> it
>> > > > > doesn't
>> > > > > > really seem like we gain much, and it requires getting
>> everything
>> > > > linked
>> > > > > up
>> > > > > > properly to be used. Assuming we looked at doing it that way, is
>> > the
>> > > > idea
>> > > > > > that we'd setup a ControllerService with the MetronRecordReader
>> > and a
>> > > > > > MetronRecordWriter and then have the StellarTransformRecord
>> > processor
>> > > > > > configured with those ControllerServices? How do we manage the
>> > > > > > configurations of the everything that way? How does the
>> > > > ControllerService
>> > > > > > get configured with whatever parser(s) are needed in the flow?
>> > > > Basically,
>> > > > > > what's your vision for how everything would tie together?
>> > > > > >
>> > > > > > I also forgot to mention this in the original writeup, but
>> there's
>> > > > > another
>> > > > > > reason to avoid the RecordReader: It's not considered stable.
>> See
>> > > > > >
>> > > > >
>> > > >
>> > > https://github.com/apache/nifi/blob/master/nifi-commons/
>> > nifi-record/src/main/java/org/apache/nifi/serialization/
>> > RecordReader.java#L34
>> > > > > .
>> > > > > > That alone makes me super hesitant to use it, if it can shift
>> out
>> > > from
>> > > > > > under us in even in incremental version.
>> > > > > >
>> > > > > > I'm also unclear on why StellarTransformRecord processor matters
>> > for
>> > > > > either
>> > > > > > approach. With the Processor approach you could simply follow
>> it up
>> > > > with
>> > > > > > the Stellar processor, the same way you'd would in the
>> RecordReader
>> > > > > > approach. The Stellar processor should be a parallel
>> improvement,
>> > > not a
>> > > > > > conflicting one.
>> > > > > >
>> > > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <
>> > ottobackwards@gmail.com
>> > > >
>> > > > > wrote:
>> > > > > >
>> > > > > >> A Metron Processor itself isn’t really necessary. A
>> > > > MetronRecordReader
>> > > > > (
>> > > > > >> either the megalithic or a reader per format ) would be a good
>> > > > > approach.
>> > > > > >> Then have StellarTransformRecord processor that can do Stellar
>> on
>> > > > _any_
>> > > > > >> record, regardless of source.
>> > > > > >>
>> > > > > >> On August 7, 2018 at 11:06:22, Justin Leet (
>> justinjleet@gmail.com
>> > )
>> > > > > wrote:
>> > > > > >>
>> > > > > >> Thanks for the comments, Otto, this is definitely great
>> feedback.
>> > > I'd
>> > > > > >> love to respond inline, but the email's already starting to
>> lose
>> > > it's
>> > > > > >> formatting, so I'll go with the classic "wall of text". Let me
>> > know
>> > > > if
>> > > > > I
>> > > > > >> didn't address everything.
>> > > > > >>
>> > > > > >> Loading modules (or jars or whatever) outside of our Processor
>> > gives
>> > > > us
>> > > > > >> the benefit of making it incredibly easy for a users to create
>> > their
>> > > > > own
>> > > > > >> parsers. I would definitely expect our own bundled parsers to
>> be
>> > > > > included
>> > > > > >> in our base NAR, but loading modules enables users to only
>> have to
>> > > > > learn
>> > > > > >> how Metron wants our stuff lined up and just plug it in. Having
>> > said
>> > > > > that,
>> > > > > >> I could see having a wrapper for our bundled parsers that
>> makes it
>> > > > > really
>> > > > > >> easy to just say you want an MetronAsaParser or
>> MetronBroParser,
>> > > etc.
>> > > > > That
>> > > > > >> would give us the best of both worlds, where it's easy to get
>> > setup
>> > > > our
>> > > > > >> bundled parsers and also trivial to pull in non-bundled
>> parsers.
>> > > What
>> > > > > >> doing this gives us is an easy way to support (hopefully) every
>> > > > parser
>> > > > > that
>> > > > > >> gets made, right out of the box, without us needing to build a
>> > > > > specialized
>> > > > > >> version of everything until we decide to and without users
>> having
>> > to
>> > > > > jump
>> > > > > >> through hoops.
>> > > > > >>
>> > > > > >> None of this prevents anyone from creating specialized parsers
>> > (for
>> > > > > perf
>> > > > > >> reasons, or to use the schema registries, or anything else).
>> It's
>> > > > > probably
>> > > > > >> worthwhile to package up some of built-in parsers and customize
>> > them
>> > > > > to use
>> > > > > >> more specialized feature appropriately as we see things get
>> used
>> > in
>> > > > the
>> > > > > >> wild. Like you said, we could likely provide Avro schemas for
>> some
>> > > of
>> > > > > this
>> > > > > >> and give users a more robust experience on what we choose to
>> > support
>> > > > > and
>> > > > > >> provide guidance for other things. I'm also worried that
>> building
>> > > > > >> specialized schemas becomes problematic for things like parser
>> > > > chaining
>> > > > > >> (where our routers wrap the underlying messages and add on
>> their
>> > own
>> > > > > info).
>> > > > > >> Going down that road potentially requires anything wrapped to
>> > have a
>> > > > > >> specialized schema for the wrapped version in addition to a
>> > vanilla
>> > > > > version
>> > > > > >> (although please correct me if I'm missing something there,
>> I'll
>> > > > openly
>> > > > > >> admit to some shakiness on how that would be handled).
>> > > > > >>
>> > > > > >> I also disagree that this is un-Nifi-like, although I'm
>> admittedly
>> > > > not
>> > > > > as
>> > > > > >> skilled there. The basis for doing this is directly inspired by
>> > the
>> > > > > >> JoltTransformer, which is extremely similar to the proposed
>> setup
>> > > for
>> > > > > our
>> > > > > >> parsers: Simply take a spec (in this case the configs,
>> including
>> > the
>> > > > > >> fieldTransformations), and delegate a mapping from bytes[] to
>> > JSON.
>> > > > The
>> > > > > >> Jolt library even has an Expression Language (check out
>> > > > > >>
>> > > > >
>> > > >
>> > > https://community.hortonworks.com/articles/105965/
>> > expression-language-with-jolt-in-apache-nifi.html
>> > > > > ),
>> > > > > >> so it's not a foreign concept. I believe Simon Ball has already
>> > done
>> > > > > some
>> > > > > >> experimenting around with getting Stellar running in NiFi, and
>> I'd
>> > > > > love to
>> > > > > >> see Stellar more readily available in NiFi in general.
>> > > > > >>
>> > > > > >> Re: the ControllerService, I see this as a way to maintain
>> > Metron's
>> > > > > use of
>> > > > > >> ZK as the source of config truth. Users could definitely be
>> using
>> > > > NiFi
>> > > > > and
>> > > > > >> Storm in tandem (parse in NiFi + enrich and index from Storm,
>> for
>> > > > > >> example). Using the ControllerService gives us a ZK instance as
>> > the
>> > > > > single
>> > > > > >> source of truth. That way we aren't forcing users to go to two
>> > > > > different
>> > > > > >> places to manage configs. This also lets us leverage our
>> existing
>> > > > > scripts
>> > > > > >> and our existing infrastructure around configs and their
>> > management
>> > > > and
>> > > > > >> validation very easily. It also gives users a way to port from
>> > NiFi
>> > > > to
>> > > > > >> Storm or vice-versa without having to migrate configs as well.
>> We
>> > > > could
>> > > > > >> also provide the option to configure the Processor itself with
>> the
>> > > > data
>> > > > > >> (just don't set up a controller service and provide the json or
>> > > > > whatever as
>> > > > > >> one of our properties).
>> > > > > >>
>> > > > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
>> > > ottobackwards@gmail.com
>> > > > >
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >>> I think this is a good idea. As I mentioned in the other
>> thread
>> > > I’ve
>> > > > > >>> been doing a lot of work on Nifi recently.
>> > > > > >>> I think the important thing is that what is done should be
>> done
>> > the
>> > > > > NiFi
>> > > > > >>> way, not bolting the Metron composition
>> > > > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
>> > > > > components
>> > > > > >>> should be single purpose and simple, allowing
>> > > > > >>> exceptional flexibility in composition.
>> > > > > >>>
>> > > > > >>> Comments inline.
>> > > > > >>>
>> > > > > >>> On August 7, 2018 at 09:27:01, Justin Leet (
>> > justinjleet@gmail.com)
>> > > > > wrote:
>> > > > > >>>
>> > > > > >>> Hi all,
>> > > > > >>>
>> > > > > >>> There's interest in being able to run Metron parsers in NiFi,
>> > > rather
>> > > > > than
>> > > > > >>>
>> > > > > >>> inside Storm. I dug into this a bit, and have some thoughts on
>> > how
>> > > > we
>> > > > > >>> could
>> > > > > >>> go about this. I'd love feedback on this, along with anything
>> > we'd
>> > > > > >>> consider must haves as well as future enhancements.
>> > > > > >>>
>> > > > > >>> 1. Separate metron-parsers into metron-parsers-common and
>> > > > metron-storm
>> > > > > >>> and create metron-parsers-nifi. For this code to be reusable
>> > across
>> > > > > >>> platforms (NiFi, Storm, and anything else in the future),
>> we'll
>> > > need
>> > > > > to
>> > > > > >>> decouple our parsers and Storm.
>> > > > > >>>
>> > > > > >>> +1. The “parsing code” should be a library that implements an
>> > > > > interface
>> > > > > >>> ( another library ).
>> > > > > >>>
>> > > > > >>> The Processors and the Storm things can share them.
>> > > > > >>>
>> > > > > >>> - There's also some nice fringe benefits around refactoring
>> our
>> > > code
>> > > > > >>> to be substantially more clear and understandable; something
>> > > > > >>> which came up
>> > > > > >>> while allowing for parser aggregation.
>> > > > > >>> 2. Create a MetronProcessor that can run our parsers.
>> > > > > >>> - I took a look at how RecordReader could be leveraged (e.g.
>> > > > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
>> > > > > >>> and is meant
>> > > > > >>> to be used by ControllerServices, which are then used by
>> > > Processors.
>> > > > > >>> There's friction involved there in terms of schemas, but also
>> in
>> > > > > terms of
>> > > > > >>>
>> > > > > >>> access to ZK configs and things like parser chaining. We might
>> > > > > >>> be able to
>> > > > > >>> leverage it, but it seems like it'd be fairly shoehorned in
>> > > > > >>> without getting
>> > > > > >>> the schema and other benefits.
>> > > > > >>>
>> > > > > >>> We won’t have to provide our ‘no schema processors’ ( grok,
>> csv,
>> > > > json
>> > > > > ).
>> > > > > >>>
>> > > > > >>> All the remaining processors DO have schemas that we know
>> about.
>> > We
>> > > > > can
>> > > > > >>> just provide the avro schemas the same way we provide the ES
>> > > > schemas.
>> > > > > >>>
>> > > > > >>> The “parsing” should not be conflated with the
>> transform/stellar
>> > in
>> > > > > >>> NiFi. We should make that separate. Running Stellar over
>> Records
>> > > > > would be
>> > > > > >>> the best thing.
>> > > > > >>>
>> > > > > >>> - This Processor would work similarly to Storm: bytes[] in ->
>> > JSON
>> > > > > >>> out.
>> > > > > >>> - There is a Processor
>> > > > > >>> <
>> > > > > >>>
>> > > > >
>> > > >
>> > > https://github.com/apache/nifi/blob/master/nifi-nar-
>> > bundles/nifi-standard-bundle/nifi-standard-processors/src/
>> > main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>> > > > > >>> >
>> > > > > >>> that
>> > > > > >>> handles loading other JARs that we can model a
>> > > > > >>> MetronParserProcessor off of
>> > > > > >>> that handles classpath/classloader issues (basically just sets
>> > up a
>> > > > > >>> classloader specific to what's being loaded and swaps out the
>> > > > Thread's
>> > > > > >>> loader when it calls to outside resources).
>> > > > > >>>
>> > > > > >>> There should be no reason to load modules outside the NAR.
>> Why do
>> > > > you
>> > > > > >>> expect to? If each Metron Processor equiv of a Metron Storm
>> > Parser
>> > > > is
>> > > > > just
>> > > > > >>> parsing to json it shouldn’t need much.And we could package
>> them
>> > in
>> > > > > the
>> > > > > >>> NAR. I would suggest we have a Processor per Parser to allow
>> for
>> > > > > >>> specialization. It should all be in the nar.
>> > > > > >>>
>> > > > > >>> The Stellar Processor, if you would support the works would
>> > > possibly
>> > > > > need
>> > > > > >>> this.
>> > > > > >>>
>> > > > > >>> 3. Create a MetronZkControllerService to supply our configs to
>> > our
>> > > > > >>> processors.
>> > > > > >>> - This is a pretty established NiFi pattern for being able to
>> > > > provide
>> > > > > >>> access to other services needed by a Processor (e.g.
>> databases or
>> > > > > large
>> > > > > >>> configurations files).
>> > > > > >>> - The same controller service can be used by all Processors to
>> > > > manage
>> > > > > >>> configs in a consistent manner.
>> > > > > >>>
>> > > > > >>> I think controller services would make sense where needed, I’m
>> > just
>> > > > > not
>> > > > > >>> sure what you imagine them being needed for?
>> > > > > >>>
>> > > > > >>> If the user has NiFi, and a Registry etc, are you saying you
>> > > imagine
>> > > > > them
>> > > > > >>> using Metron + ZK to manage configurations? Or to be using
>> BOTH
>> > > > storm
>> > > > > >>> processors and Nifi Processors?
>> > > > > >>>
>> > > > > >>> At that point, we can just NAR our controller service and
>> parser
>> > > > > processor
>> > > > > >>>
>> > > > > >>> up as needed, deploy them to NiFi, and let the user provide a
>> > > config
>> > > > > for
>> > > > > >>> where their custom parsers can be provided (i.e. their parser
>> > jar).
>> > > > > This
>> > > > > >>> would be 3 nars (processor, controller-service, and
>> > > > > controller-service-api
>> > > > > >>>
>> > > > > >>> in order to bind the other two together).
>> > > > > >>>
>> > > > > >>> Once deployed, our ability to use parsers should fit well into
>> > the
>> > > > > >>> standard
>> > > > > >>> NiFi workflow:
>> > > > > >>>
>> > > > > >>> 1. Create a MetronZkControllerService.
>> > > > > >>> 2. Configure the service to point at zookeeper.
>> > > > > >>> 3. Create a MetronParser.
>> > > > > >>> 4. Configure it to use the controller service + parser jar
>> > location
>> > > > +
>> > > > > >>> any other needed configs.
>> > > > > >>> 5. Use the outputs as needed downstream (either writing out to
>> > > Kafka
>> > > > > or
>> > > > > >>> feeding into more MetronParsers, etc.)
>> > > > > >>>
>> > > > > >>> Chaining parsers should ideally become a matter of chaining
>> > > > > MetronParsers
>> > > > > >>>
>> > > > > >>> (and making sure the enveloping configs carry through
>> properly).
>> > > For
>> > > > > >>> parser
>> > > > > >>> aggregation, I'd just avoid it entirely until we know it's
>> needed
>> > > in
>> > > > > NiFi.
>> > > > > >>>
>> > > > > >>> Justin
>> > > > >
>> > > > > -------------------
>> > > > > Thank you,
>> > > > >
>> > > > > James Sirota
>> > > > > PMC- Apache Metron
>> > > > > jsirota AT apache DOT org
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> --
>> simon elliston ball
>> @sireb
>>
>>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Yep, I'm wondering whether our parser interface should have the ability to
create schema either like that, or well, that, which would be helpful
within Metron as well.

@Otto, the one thing missing from the record reader api, is that if you
don't emit any records at all for a flow file, it errors, which is not
strictly speaking an error, but yeah, we can certainly control things like
filtering errors aside from this. I would say this was a nifi bug
(debatably) which should be fixed on that side.

Simon

On 13 August 2018 at 14:29, Otto Fowler <ot...@gmail.com> wrote:

> Also,  If we are doing the record readers, we can have a reader for a
> parser type and explicitly set the schema, as seen here :
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-
> services/nifi-record-serialization-services-bundle/
> nifi-record-serialization-services/src/main/java/org/apache/nifi/syslog/
> Syslog5424Reader.java
>
>
>
> On August 13, 2018 at 09:26:50, Otto Fowler (ottobackwards@gmail.com)
> wrote:
>
> If we can do the record readers ourselves ( with the parsers inside them )
> we can handle the returns.
> I’ll be doing the net flow 5 readers once the net flow 5 processor PR (
> not mine ) is in.
>
> I don’t think having a generic class loading parsers foo and having to
> manage all that is preferable to having
> an archetype and explicit parsers.
>
> Nifi processors and readers are self documenting, and this approach will
> make that not possible, as another consideration.
>
>
>
> On August 13, 2018 at 06:50:09, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> Maybe the edge use case will clarify the config issue a little. The reason
> I would want to be able to push Metron parsers into NiFi would be so I can
> pre-parse and filter on the edge to save bandwidth from remote locations. I
> would expect to be able to parse at the edge and use NiFi to prioritise or
> filter on the Metron ready data, then push through to a 'NoOp' parser in
> Metron. For this to happen, we would absolutely not want to connect to
> Zookeeper, so I'm +1 on Otto's suggestion that the config be embeddable in
> NiFi properties. We cannot assume ZK connectivity from NiFi.
>
> I can also see a scenario where NiFi might make it easier to chain parsers,
> which is where it overlaps more with Metron. This is more about the fact
> that NiFi make it a lot easier to configure and manage complex multi-step
> flows than Metron, and is way more user intuitive from a design and
> monitoring perspective. My main concern around using NiFi in this way is
> about the load on the content repository. We are looking at a lot of
> content level transformation here. You could argue that the same load is
> taken off Kafka in the chaining scenario, but there is still a chance for a
> user to accidentally create a lot of disk access if they go over the top
> with NiFi.
>
> I see this as potentially a a chance to make the Metron Parser interface
> compatible with NiFi Record Readers. Then both communities could benefit
> from sharing each other's parsers.
>
> In terms of the NAR approach, I would say we have a base bundle of the NiFi
> bits (https://github.com/simonellistonball/metron/tree/nifi already has
> this for stellar, enrichments and an opinionated publisher, it also has a
> readme with some discussion around this
> https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle).
> We can then use other nar dependencies to side load parser classes into the
> record reader. We would then need to do some fancy property validation in
> NiFi to ensure the classes were available.
>
> Also, Record Readers are much much faster. The only problem I've found with
> them is that they error on blank output, which was a problem for me writing
> a netflow 9 reader (template only records need to live in NiFi cache, but
> not be emitted).
>
> In terms of the schema objection, I'm not sure why schema focus is a
> problem. Our parsers have implicit schema and the output schema formats
> used in NiFi are very flexible and could be "just a map". That said, we
> could also take the opportunity to introduce a method to the parser
> interface to emit traits to contribute the bits of schema that a parser
> produces. This would ultimately lead to us being able to generate output
> schemas (ES, Solr, Hive, whatever which would take a lot of the pain out of
> setup for sensors).
>
> Simon
>
> On 9 August 2018 at 16:42, Otto Fowler <ot...@gmail.com> wrote:
>
> > I would say that
> >
> > - For each configuration parameter we want to pull in, it should be
> > explicitly configured through a property as well as through a controller
> > service that accesses the metron zk
> > - Transformations should not be conflated with parsing in those
> processors
> > or readers
> >
> > There is no on the fly configuration change in nifi ( You can’t change
> > properties once started ).
> >
> > Wouldn’t the simplest minimal start be to say that we expect either nifi
> or
> > metron and simplify things? Let nifi nifi, let metron metron.
> >
> >
> > On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com)
> wrote:
> >
> > That's definitely good info, thanks for reaching out to them about it.
> >
> > In terms of exposing/sharing, I don't think we have to couple them
> tightly
> > (in fact, I think we should loosen the coupling as much as possible
> without
> > forcing reimplementation of things). I think there's definitely a way to
> do
> > that terms of the general purpose processor I proposed (or in terms of
> > RecordReader or another implementation).
> >
> > It would definitely be easy enough to configure it to either pull from ZK
> > or to use a parser config json extract as a parameter (to maintain the
> same
> > formatting and make migration easy). And we can still build specific
> > NiFi-oriented parsers as needed (that manage things like Schema via the
> > registry and other Nifi mechanisms). This keeps parsers entirely
> decoupled
> > from a metron installation.
> >
> > Alternatively, we extract our config handling to a module and scripts we
> > can package up and easily deploy configs against ZK (or the maybe Nifi's
> > StateController's or whatever). We definitely shouldn't need absolutely
> > everything installed to be able to run just parsers on Nifi.
> >
> > Having said that, right now the easiest way we have to maintain on the
> fly
> > updatable configs (and updatable is important!) is via ZK. Params in Nifi
> > aren't quite that flexible, to the best of my knowledge (i.e. you have to
> > stop, update config and restart). We might be able to exploit the
> > StateController to manage this for us, but I'm honestly not familiar
> enough
> > with it and for deployments split between NiFi and Storm, it means
> > configuration gets managed in a couple different ways (which may with
> users
> > since there is a fairly brightline delineation which makes it easier to
> > accept). There some complicated configs like fieldTransforms, which is
> > part of why I would like things to be configured in the same format (if
> not
> > the same mechanism).
> >
> > Ideally, in my mind, the parsers shared between both NiFi and Storm just
> > implement the very general MessageParser interface (which is pretty
> > minimal, a couple setup methods, validation, and the actual parse). This
> > is pretty lightweight and the split of metron-parsers into
> > metron-parsers-common et al. would loosen the coupling between parsers
> and
> > the rest of metron into that core needed to support that.
> >
> > IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
> > config management) that lets us run our set of parsers, lets users build
> > new parsers (and don't block specialized NiFi implementations that
> exploit
> > NiFi's feature set), and lets us get things configured in a relatively
> > consistent manner, without losing features, and hopefully requiring a
> > pretty minimal slice of Metron to be useful.
> >
> > On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > I think the benefits are clear. What is unclear is if the goal is to
> > > expose or share or re-use Metron capabilities ( stellar, parsing ) in
> > nifi
> > > in a way that is native to nifi ( configured and managed in nifi ),
> where
> > > you may not even need metron ( say you just want to parse asa ) or if
> the
> > > goal is to have a hybrid approach coupling the processors/readers to
> the
> > > metron installation.
> > >
> > >
> > > On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >
> > > I'll add onto Mike's discussion with the original set of requirements I
> > had
> > > in mind (and apply feedback on these as necessary!). This is largely
> > > overlap with what Mike said, but I want to make sure it's clear where
> my
> > > proposal was coming from, so we can improve on it as needed. James and
> > > Mike are also right, I think I skipped over the benefits of NiFi in
> > general
> > > a bit, so thanks for chiming in there.
> > >
> > > - Deploy our bundled parsers without needing custom wrapping on all of
> > > them.
> > > - Don't prevent ourselves from building custom wrapping as needed.
> > > - Custom Java parsers with an easy way to hook in, similar to what we
> > > already do in Storm.
> > > - One stop (or at least one format) configuration, for the case when
> > we're
> > > doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> > > indexing). I don't think it'll always be "start in NiFi, end in Storm",
> > > especially as we build out Stellar capability, but I also don't want
> > users
> > > learning a different set of configs and config tools for every platform
> > we
> > > run on.
> > > - Ability to build out parsers and other systems fairly easily, e.g.
> > Spark.
> > > - Support our current use cases (in particular parser chaining as a
> more
> > > advanced use case).
> > >
> > > It really boils down to providing a relatively simple user path to be
> > able
> > > to migrate to NiFi as needed or desired as simply as possible in a very
> > > general way, while not preventing parser by parser enhancements.
> > >
> > > On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > I think it also provides customers greater control over their
> > > architecture
> > > > by giving them the flexibility to choose where/how to host their
> > parsers.
> > > >
> > > > To Justin's point about the API, my biggest concern about the
> > > RecordReader
> > > > approach is that it is not stable. We already have a similar problem
> in
> > > > having the TransportClient in ElasticSearch - they are prone to
> > changing
> > > it
> > > > in minor versions with the advent of their newer REST API, which is
> > > > problematic for ensuring a stable installation.
> > > >
> > > > From my own perspective, our goal with NiFi, at least in part, should
> > be
> > > > the ability to deploy our core parsing infrastructure, i.e.
> > > >
> > > > - pre-built parsers
> > > > - custom java parsers
> > > > - Stellar transforms
> > > > - custom stellar transforms
> > > >
> > > > And have the ability to configure it similarly to how we configure
> > > parsers
> > > > within Storm. Consistent with our recent parser chaining and
> > aggregation
> > > > feature, users should be able to construct and deploy similar
> > constructs
> > > in
> > > > NiFi. The core architectural shift would be that parser code should
> be
> > > > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > > > Streaming?, other> and platform architects and devops teams can
> choose
> > > how
> > > > and where to deploy.
> > > >
> > > > Best,
> > > > Mike
> > > >
> > > >
> > > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org>
> > wrote:
> > > >
> > > > > Integration with NiFi would be useful for parsing low-volume
> > > telemetries
> > > > > at the edge. This is a much more resource friendly way to do it
> than
> > > > > setting up dedicated storm topologies. The integration would be
> that
> > > the
> > > > > NiFi processor parses the data and pushes it straight into the
> > > enrichment
> > > > > topic, saving us the resources of having multiple parsers in storm
> > > > >
> > > > > Thanks,
> > > > > James
> > > > >
> > > > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > > > > Why do we start over. We are going back and forth on
> > implementation,
> > > > and
> > > > > I
> > > > > > don’t think we have the same goals or concerns.
> > > > > >
> > > > > > What would be the requirements or goals of metron integration
> with
> > > > Nifi?
> > > > > > How many levels or options for integration do we have?
> > > > > > What are the approaches to choose from?
> > > > > > Who are the target users?
> > > > > >
> > > > > > On August 7, 2018 at 12:24:56, Justin Leet (
> justinjleet@gmail.com)
> > > > > wrote:
> > > > > >
> > > > > > So how does the MetronRecordReader roll into everything? It seems
> > > like
> > > > > it'd
> > > > > > be more useful on the reader per format approach, but otherwise
> it
> > > > > doesn't
> > > > > > really seem like we gain much, and it requires getting everything
> > > > linked
> > > > > up
> > > > > > properly to be used. Assuming we looked at doing it that way, is
> > the
> > > > idea
> > > > > > that we'd setup a ControllerService with the MetronRecordReader
> > and a
> > > > > > MetronRecordWriter and then have the StellarTransformRecord
> > processor
> > > > > > configured with those ControllerServices? How do we manage the
> > > > > > configurations of the everything that way? How does the
> > > > ControllerService
> > > > > > get configured with whatever parser(s) are needed in the flow?
> > > > Basically,
> > > > > > what's your vision for how everything would tie together?
> > > > > >
> > > > > > I also forgot to mention this in the original writeup, but
> there's
> > > > > another
> > > > > > reason to avoid the RecordReader: It's not considered stable. See
> > > > > >
> > > > >
> > > >
> > > https://github.com/apache/nifi/blob/master/nifi-commons/
> > nifi-record/src/main/java/org/apache/nifi/serialization/
> > RecordReader.java#L34
> > > > > .
> > > > > > That alone makes me super hesitant to use it, if it can shift out
> > > from
> > > > > > under us in even in incremental version.
> > > > > >
> > > > > > I'm also unclear on why StellarTransformRecord processor matters
> > for
> > > > > either
> > > > > > approach. With the Processor approach you could simply follow it
> up
> > > > with
> > > > > > the Stellar processor, the same way you'd would in the
> RecordReader
> > > > > > approach. The Stellar processor should be a parallel improvement,
> > > not a
> > > > > > conflicting one.
> > > > > >
> > > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > >> A Metron Processor itself isn’t really necessary. A
> > > > MetronRecordReader
> > > > > (
> > > > > >> either the megalithic or a reader per format ) would be a good
> > > > > approach.
> > > > > >> Then have StellarTransformRecord processor that can do Stellar
> on
> > > > _any_
> > > > > >> record, regardless of source.
> > > > > >>
> > > > > >> On August 7, 2018 at 11:06:22, Justin Leet (
> justinjleet@gmail.com
> > )
> > > > > wrote:
> > > > > >>
> > > > > >> Thanks for the comments, Otto, this is definitely great
> feedback.
> > > I'd
> > > > > >> love to respond inline, but the email's already starting to lose
> > > it's
> > > > > >> formatting, so I'll go with the classic "wall of text". Let me
> > know
> > > > if
> > > > > I
> > > > > >> didn't address everything.
> > > > > >>
> > > > > >> Loading modules (or jars or whatever) outside of our Processor
> > gives
> > > > us
> > > > > >> the benefit of making it incredibly easy for a users to create
> > their
> > > > > own
> > > > > >> parsers. I would definitely expect our own bundled parsers to be
> > > > > included
> > > > > >> in our base NAR, but loading modules enables users to only have
> to
> > > > > learn
> > > > > >> how Metron wants our stuff lined up and just plug it in. Having
> > said
> > > > > that,
> > > > > >> I could see having a wrapper for our bundled parsers that makes
> it
> > > > > really
> > > > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> > > etc.
> > > > > That
> > > > > >> would give us the best of both worlds, where it's easy to get
> > setup
> > > > our
> > > > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> > > What
> > > > > >> doing this gives us is an easy way to support (hopefully) every
> > > > parser
> > > > > that
> > > > > >> gets made, right out of the box, without us needing to build a
> > > > > specialized
> > > > > >> version of everything until we decide to and without users
> having
> > to
> > > > > jump
> > > > > >> through hoops.
> > > > > >>
> > > > > >> None of this prevents anyone from creating specialized parsers
> > (for
> > > > > perf
> > > > > >> reasons, or to use the schema registries, or anything else).
> It's
> > > > > probably
> > > > > >> worthwhile to package up some of built-in parsers and customize
> > them
> > > > > to use
> > > > > >> more specialized feature appropriately as we see things get used
> > in
> > > > the
> > > > > >> wild. Like you said, we could likely provide Avro schemas for
> some
> > > of
> > > > > this
> > > > > >> and give users a more robust experience on what we choose to
> > support
> > > > > and
> > > > > >> provide guidance for other things. I'm also worried that
> building
> > > > > >> specialized schemas becomes problematic for things like parser
> > > > chaining
> > > > > >> (where our routers wrap the underlying messages and add on their
> > own
> > > > > info).
> > > > > >> Going down that road potentially requires anything wrapped to
> > have a
> > > > > >> specialized schema for the wrapped version in addition to a
> > vanilla
> > > > > version
> > > > > >> (although please correct me if I'm missing something there, I'll
> > > > openly
> > > > > >> admit to some shakiness on how that would be handled).
> > > > > >>
> > > > > >> I also disagree that this is un-Nifi-like, although I'm
> admittedly
> > > > not
> > > > > as
> > > > > >> skilled there. The basis for doing this is directly inspired by
> > the
> > > > > >> JoltTransformer, which is extremely similar to the proposed
> setup
> > > for
> > > > > our
> > > > > >> parsers: Simply take a spec (in this case the configs, including
> > the
> > > > > >> fieldTransformations), and delegate a mapping from bytes[] to
> > JSON.
> > > > The
> > > > > >> Jolt library even has an Expression Language (check out
> > > > > >>
> > > > >
> > > >
> > > https://community.hortonworks.com/articles/105965/
> > expression-language-with-jolt-in-apache-nifi.html
> > > > > ),
> > > > > >> so it's not a foreign concept. I believe Simon Ball has already
> > done
> > > > > some
> > > > > >> experimenting around with getting Stellar running in NiFi, and
> I'd
> > > > > love to
> > > > > >> see Stellar more readily available in NiFi in general.
> > > > > >>
> > > > > >> Re: the ControllerService, I see this as a way to maintain
> > Metron's
> > > > > use of
> > > > > >> ZK as the source of config truth. Users could definitely be
> using
> > > > NiFi
> > > > > and
> > > > > >> Storm in tandem (parse in NiFi + enrich and index from Storm,
> for
> > > > > >> example). Using the ControllerService gives us a ZK instance as
> > the
> > > > > single
> > > > > >> source of truth. That way we aren't forcing users to go to two
> > > > > different
> > > > > >> places to manage configs. This also lets us leverage our
> existing
> > > > > scripts
> > > > > >> and our existing infrastructure around configs and their
> > management
> > > > and
> > > > > >> validation very easily. It also gives users a way to port from
> > NiFi
> > > > to
> > > > > >> Storm or vice-versa without having to migrate configs as well.
> We
> > > > could
> > > > > >> also provide the option to configure the Processor itself with
> the
> > > > data
> > > > > >> (just don't set up a controller service and provide the json or
> > > > > whatever as
> > > > > >> one of our properties).
> > > > > >>
> > > > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> > > ottobackwards@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> I think this is a good idea. As I mentioned in the other thread
> > > I’ve
> > > > > >>> been doing a lot of work on Nifi recently.
> > > > > >>> I think the important thing is that what is done should be done
> > the
> > > > > NiFi
> > > > > >>> way, not bolting the Metron composition
> > > > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > > > components
> > > > > >>> should be single purpose and simple, allowing
> > > > > >>> exceptional flexibility in composition.
> > > > > >>>
> > > > > >>> Comments inline.
> > > > > >>>
> > > > > >>> On August 7, 2018 at 09:27:01, Justin Leet (
> > justinjleet@gmail.com)
> > > > > wrote:
> > > > > >>>
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>> There's interest in being able to run Metron parsers in NiFi,
> > > rather
> > > > > than
> > > > > >>>
> > > > > >>> inside Storm. I dug into this a bit, and have some thoughts on
> > how
> > > > we
> > > > > >>> could
> > > > > >>> go about this. I'd love feedback on this, along with anything
> > we'd
> > > > > >>> consider must haves as well as future enhancements.
> > > > > >>>
> > > > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > > > metron-storm
> > > > > >>> and create metron-parsers-nifi. For this code to be reusable
> > across
> > > > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> > > need
> > > > > to
> > > > > >>> decouple our parsers and Storm.
> > > > > >>>
> > > > > >>> +1. The “parsing code” should be a library that implements an
> > > > > interface
> > > > > >>> ( another library ).
> > > > > >>>
> > > > > >>> The Processors and the Storm things can share them.
> > > > > >>>
> > > > > >>> - There's also some nice fringe benefits around refactoring our
> > > code
> > > > > >>> to be substantially more clear and understandable; something
> > > > > >>> which came up
> > > > > >>> while allowing for parser aggregation.
> > > > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > > > >>> and is meant
> > > > > >>> to be used by ControllerServices, which are then used by
> > > Processors.
> > > > > >>> There's friction involved there in terms of schemas, but also
> in
> > > > > terms of
> > > > > >>>
> > > > > >>> access to ZK configs and things like parser chaining. We might
> > > > > >>> be able to
> > > > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > > > >>> without getting
> > > > > >>> the schema and other benefits.
> > > > > >>>
> > > > > >>> We won’t have to provide our ‘no schema processors’ ( grok,
> csv,
> > > > json
> > > > > ).
> > > > > >>>
> > > > > >>> All the remaining processors DO have schemas that we know
> about.
> > We
> > > > > can
> > > > > >>> just provide the avro schemas the same way we provide the ES
> > > > schemas.
> > > > > >>>
> > > > > >>> The “parsing” should not be conflated with the
> transform/stellar
> > in
> > > > > >>> NiFi. We should make that separate. Running Stellar over
> Records
> > > > > would be
> > > > > >>> the best thing.
> > > > > >>>
> > > > > >>> - This Processor would work similarly to Storm: bytes[] in ->
> > JSON
> > > > > >>> out.
> > > > > >>> - There is a Processor
> > > > > >>> <
> > > > > >>>
> > > > >
> > > >
> > > https://github.com/apache/nifi/blob/master/nifi-nar-
> > bundles/nifi-standard-bundle/nifi-standard-processors/src/
> > main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > > > >>> >
> > > > > >>> that
> > > > > >>> handles loading other JARs that we can model a
> > > > > >>> MetronParserProcessor off of
> > > > > >>> that handles classpath/classloader issues (basically just sets
> > up a
> > > > > >>> classloader specific to what's being loaded and swaps out the
> > > > Thread's
> > > > > >>> loader when it calls to outside resources).
> > > > > >>>
> > > > > >>> There should be no reason to load modules outside the NAR. Why
> do
> > > > you
> > > > > >>> expect to? If each Metron Processor equiv of a Metron Storm
> > Parser
> > > > is
> > > > > just
> > > > > >>> parsing to json it shouldn’t need much.And we could package
> them
> > in
> > > > > the
> > > > > >>> NAR. I would suggest we have a Processor per Parser to allow
> for
> > > > > >>> specialization. It should all be in the nar.
> > > > > >>>
> > > > > >>> The Stellar Processor, if you would support the works would
> > > possibly
> > > > > need
> > > > > >>> this.
> > > > > >>>
> > > > > >>> 3. Create a MetronZkControllerService to supply our configs to
> > our
> > > > > >>> processors.
> > > > > >>> - This is a pretty established NiFi pattern for being able to
> > > > provide
> > > > > >>> access to other services needed by a Processor (e.g. databases
> or
> > > > > large
> > > > > >>> configurations files).
> > > > > >>> - The same controller service can be used by all Processors to
> > > > manage
> > > > > >>> configs in a consistent manner.
> > > > > >>>
> > > > > >>> I think controller services would make sense where needed, I’m
> > just
> > > > > not
> > > > > >>> sure what you imagine them being needed for?
> > > > > >>>
> > > > > >>> If the user has NiFi, and a Registry etc, are you saying you
> > > imagine
> > > > > them
> > > > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > > > storm
> > > > > >>> processors and Nifi Processors?
> > > > > >>>
> > > > > >>> At that point, we can just NAR our controller service and
> parser
> > > > > processor
> > > > > >>>
> > > > > >>> up as needed, deploy them to NiFi, and let the user provide a
> > > config
> > > > > for
> > > > > >>> where their custom parsers can be provided (i.e. their parser
> > jar).
> > > > > This
> > > > > >>> would be 3 nars (processor, controller-service, and
> > > > > controller-service-api
> > > > > >>>
> > > > > >>> in order to bind the other two together).
> > > > > >>>
> > > > > >>> Once deployed, our ability to use parsers should fit well into
> > the
> > > > > >>> standard
> > > > > >>> NiFi workflow:
> > > > > >>>
> > > > > >>> 1. Create a MetronZkControllerService.
> > > > > >>> 2. Configure the service to point at zookeeper.
> > > > > >>> 3. Create a MetronParser.
> > > > > >>> 4. Configure it to use the controller service + parser jar
> > location
> > > > +
> > > > > >>> any other needed configs.
> > > > > >>> 5. Use the outputs as needed downstream (either writing out to
> > > Kafka
> > > > > or
> > > > > >>> feeding into more MetronParsers, etc.)
> > > > > >>>
> > > > > >>> Chaining parsers should ideally become a matter of chaining
> > > > > MetronParsers
> > > > > >>>
> > > > > >>> (and making sure the enveloping configs carry through
> properly).
> > > For
> > > > > >>> parser
> > > > > >>> aggregation, I'd just avoid it entirely until we know it's
> needed
> > > in
> > > > > NiFi.
> > > > > >>>
> > > > > >>> Justin
> > > > >
> > > > > -------------------
> > > > > Thank you,
> > > > >
> > > > > James Sirota
> > > > > PMC- Apache Metron
> > > > > jsirota AT apache DOT org
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>
>


-- 
--
simon elliston ball
@sireb

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

Also,  If we are doing the record readers, we can have a reader for a
parser type and explicitly set the schema, as seen here :
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/syslog/Syslog5424Reader.java



On August 13, 2018 at 09:26:50, Otto Fowler (ottobackwards@gmail.com) wrote:

If we can do the record readers ourselves ( with the parsers inside them )
we can handle the returns.
I’ll be doing the net flow 5 readers once the net flow 5 processor PR ( not
mine ) is in.

I don’t think having a generic class loading parsers foo and having to
manage all that is preferable to having
an archetype and explicit parsers.

Nifi processors and readers are self documenting, and this approach will
make that not possible, as another consideration.



On August 13, 2018 at 06:50:09, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Maybe the edge use case will clarify the config issue a little. The reason
I would want to be able to push Metron parsers into NiFi would be so I can
pre-parse and filter on the edge to save bandwidth from remote locations. I
would expect to be able to parse at the edge and use NiFi to prioritise or
filter on the Metron ready data, then push through to a 'NoOp' parser in
Metron. For this to happen, we would absolutely not want to connect to
Zookeeper, so I'm +1 on Otto's suggestion that the config be embeddable in
NiFi properties. We cannot assume ZK connectivity from NiFi.

I can also see a scenario where NiFi might make it easier to chain parsers,
which is where it overlaps more with Metron. This is more about the fact
that NiFi make it a lot easier to configure and manage complex multi-step
flows than Metron, and is way more user intuitive from a design and
monitoring perspective. My main concern around using NiFi in this way is
about the load on the content repository. We are looking at a lot of
content level transformation here. You could argue that the same load is
taken off Kafka in the chaining scenario, but there is still a chance for a
user to accidentally create a lot of disk access if they go over the top
with NiFi.

I see this as potentially a a chance to make the Metron Parser interface
compatible with NiFi Record Readers. Then both communities could benefit
from sharing each other's parsers.

In terms of the NAR approach, I would say we have a base bundle of the NiFi
bits (https://github.com/simonellistonball/metron/tree/nifi already has
this for stellar, enrichments and an opinionated publisher, it also has a
readme with some discussion around this
https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle).
We can then use other nar dependencies to side load parser classes into the
record reader. We would then need to do some fancy property validation in
NiFi to ensure the classes were available.

Also, Record Readers are much much faster. The only problem I've found with
them is that they error on blank output, which was a problem for me writing
a netflow 9 reader (template only records need to live in NiFi cache, but
not be emitted).

In terms of the schema objection, I'm not sure why schema focus is a
problem. Our parsers have implicit schema and the output schema formats
used in NiFi are very flexible and could be "just a map". That said, we
could also take the opportunity to introduce a method to the parser
interface to emit traits to contribute the bits of schema that a parser
produces. This would ultimately lead to us being able to generate output
schemas (ES, Solr, Hive, whatever which would take a lot of the pain out of
setup for sensors).

Simon

On 9 August 2018 at 16:42, Otto Fowler <ot...@gmail.com> wrote:

> I would say that
>
> - For each configuration parameter we want to pull in, it should be
> explicitly configured through a property as well as through a controller
> service that accesses the metron zk
> - Transformations should not be conflated with parsing in those processors
> or readers
>
> There is no on the fly configuration change in nifi ( You can’t change
> properties once started ).
>
> Wouldn’t the simplest minimal start be to say that we expect either nifi
or
> metron and simplify things? Let nifi nifi, let metron metron.
>
>
> On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com) wrote:
>
> That's definitely good info, thanks for reaching out to them about it.
>
> In terms of exposing/sharing, I don't think we have to couple them tightly
> (in fact, I think we should loosen the coupling as much as possible
without
> forcing reimplementation of things). I think there's definitely a way to
do
> that terms of the general purpose processor I proposed (or in terms of
> RecordReader or another implementation).
>
> It would definitely be easy enough to configure it to either pull from ZK
> or to use a parser config json extract as a parameter (to maintain the
same
> formatting and make migration easy). And we can still build specific
> NiFi-oriented parsers as needed (that manage things like Schema via the
> registry and other Nifi mechanisms). This keeps parsers entirely decoupled
> from a metron installation.
>
> Alternatively, we extract our config handling to a module and scripts we
> can package up and easily deploy configs against ZK (or the maybe Nifi's
> StateController's or whatever). We definitely shouldn't need absolutely
> everything installed to be able to run just parsers on Nifi.
>
> Having said that, right now the easiest way we have to maintain on the fly
> updatable configs (and updatable is important!) is via ZK. Params in Nifi
> aren't quite that flexible, to the best of my knowledge (i.e. you have to
> stop, update config and restart). We might be able to exploit the
> StateController to manage this for us, but I'm honestly not familiar
enough
> with it and for deployments split between NiFi and Storm, it means
> configuration gets managed in a couple different ways (which may with
users
> since there is a fairly brightline delineation which makes it easier to
> accept). There some complicated configs like fieldTransforms, which is
> part of why I would like things to be configured in the same format (if
not
> the same mechanism).
>
> Ideally, in my mind, the parsers shared between both NiFi and Storm just
> implement the very general MessageParser interface (which is pretty
> minimal, a couple setup methods, validation, and the actual parse). This
> is pretty lightweight and the split of metron-parsers into
> metron-parsers-common et al. would loosen the coupling between parsers and
> the rest of metron into that core needed to support that.
>
> IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
> config management) that lets us run our set of parsers, lets users build
> new parsers (and don't block specialized NiFi implementations that exploit
> NiFi's feature set), and lets us get things configured in a relatively
> consistent manner, without losing features, and hopefully requiring a
> pretty minimal slice of Metron to be useful.
>
> On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com>
> wrote:
>
> > I think the benefits are clear. What is unclear is if the goal is to
> > expose or share or re-use Metron capabilities ( stellar, parsing ) in
> nifi
> > in a way that is native to nifi ( configured and managed in nifi ),
where
> > you may not even need metron ( say you just want to parse asa ) or if
the
> > goal is to have a hybrid approach coupling the processors/readers to the
> > metron installation.
> >
> >
> > On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com)
> wrote:
> >
> > I'll add onto Mike's discussion with the original set of requirements I
> had
> > in mind (and apply feedback on these as necessary!). This is largely
> > overlap with what Mike said, but I want to make sure it's clear where my
> > proposal was coming from, so we can improve on it as needed. James and
> > Mike are also right, I think I skipped over the benefits of NiFi in
> general
> > a bit, so thanks for chiming in there.
> >
> > - Deploy our bundled parsers without needing custom wrapping on all of
> > them.
> > - Don't prevent ourselves from building custom wrapping as needed.
> > - Custom Java parsers with an easy way to hook in, similar to what we
> > already do in Storm.
> > - One stop (or at least one format) configuration, for the case when
> we're
> > doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> > indexing). I don't think it'll always be "start in NiFi, end in Storm",
> > especially as we build out Stellar capability, but I also don't want
> users
> > learning a different set of configs and config tools for every platform
> we
> > run on.
> > - Ability to build out parsers and other systems fairly easily, e.g.
> Spark.
> > - Support our current use cases (in particular parser chaining as a more
> > advanced use case).
> >
> > It really boils down to providing a relatively simple user path to be
> able
> > to migrate to NiFi as needed or desired as simply as possible in a very
> > general way, while not preventing parser by parser enhancements.
> >
> > On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > I think it also provides customers greater control over their
> > architecture
> > > by giving them the flexibility to choose where/how to host their
> parsers.
> > >
> > > To Justin's point about the API, my biggest concern about the
> > RecordReader
> > > approach is that it is not stable. We already have a similar problem
in
> > > having the TransportClient in ElasticSearch - they are prone to
> changing
> > it
> > > in minor versions with the advent of their newer REST API, which is
> > > problematic for ensuring a stable installation.
> > >
> > > From my own perspective, our goal with NiFi, at least in part, should
> be
> > > the ability to deploy our core parsing infrastructure, i.e.
> > >
> > > - pre-built parsers
> > > - custom java parsers
> > > - Stellar transforms
> > > - custom stellar transforms
> > >
> > > And have the ability to configure it similarly to how we configure
> > parsers
> > > within Storm. Consistent with our recent parser chaining and
> aggregation
> > > feature, users should be able to construct and deploy similar
> constructs
> > in
> > > NiFi. The core architectural shift would be that parser code should be
> > > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > > Streaming?, other> and platform architects and devops teams can choose
> > how
> > > and where to deploy.
> > >
> > > Best,
> > > Mike
> > >
> > >
> > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org>
> wrote:
> > >
> > > > Integration with NiFi would be useful for parsing low-volume
> > telemetries
> > > > at the edge. This is a much more resource friendly way to do it than
> > > > setting up dedicated storm topologies. The integration would be that
> > the
> > > > NiFi processor parses the data and pushes it straight into the
> > enrichment
> > > > topic, saving us the resources of having multiple parsers in storm
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > > > Why do we start over. We are going back and forth on
> implementation,
> > > and
> > > > I
> > > > > don’t think we have the same goals or concerns.
> > > > >
> > > > > What would be the requirements or goals of metron integration with
> > > Nifi?
> > > > > How many levels or options for integration do we have?
> > > > > What are the approaches to choose from?
> > > > > Who are the target users?
> > > > >
> > > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > > > wrote:
> > > > >
> > > > > So how does the MetronRecordReader roll into everything? It seems
> > like
> > > > it'd
> > > > > be more useful on the reader per format approach, but otherwise it
> > > > doesn't
> > > > > really seem like we gain much, and it requires getting everything
> > > linked
> > > > up
> > > > > properly to be used. Assuming we looked at doing it that way, is
> the
> > > idea
> > > > > that we'd setup a ControllerService with the MetronRecordReader
> and a
> > > > > MetronRecordWriter and then have the StellarTransformRecord
> processor
> > > > > configured with those ControllerServices? How do we manage the
> > > > > configurations of the everything that way? How does the
> > > ControllerService
> > > > > get configured with whatever parser(s) are needed in the flow?
> > > Basically,
> > > > > what's your vision for how everything would tie together?
> > > > >
> > > > > I also forgot to mention this in the original writeup, but there's
> > > > another
> > > > > reason to avoid the RecordReader: It's not considered stable. See
> > > > >
> > > >
> > >
> > https://github.com/apache/nifi/blob/master/nifi-commons/
> nifi-record/src/main/java/org/apache/nifi/serialization/
> RecordReader.java#L34
> > > > .
> > > > > That alone makes me super hesitant to use it, if it can shift out
> > from
> > > > > under us in even in incremental version.
> > > > >
> > > > > I'm also unclear on why StellarTransformRecord processor matters
> for
> > > > either
> > > > > approach. With the Processor approach you could simply follow it
up
> > > with
> > > > > the Stellar processor, the same way you'd would in the
RecordReader
> > > > > approach. The Stellar processor should be a parallel improvement,
> > not a
> > > > > conflicting one.
> > > > >
> > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > >> A Metron Processor itself isn’t really necessary. A
> > > MetronRecordReader
> > > > (
> > > > >> either the megalithic or a reader per format ) would be a good
> > > > approach.
> > > > >> Then have StellarTransformRecord processor that can do Stellar on
> > > _any_
> > > > >> record, regardless of source.
> > > > >>
> > > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com
> )
> > > > wrote:
> > > > >>
> > > > >> Thanks for the comments, Otto, this is definitely great feedback.
> > I'd
> > > > >> love to respond inline, but the email's already starting to lose
> > it's
> > > > >> formatting, so I'll go with the classic "wall of text". Let me
> know
> > > if
> > > > I
> > > > >> didn't address everything.
> > > > >>
> > > > >> Loading modules (or jars or whatever) outside of our Processor
> gives
> > > us
> > > > >> the benefit of making it incredibly easy for a users to create
> their
> > > > own
> > > > >> parsers. I would definitely expect our own bundled parsers to be
> > > > included
> > > > >> in our base NAR, but loading modules enables users to only have
to
> > > > learn
> > > > >> how Metron wants our stuff lined up and just plug it in. Having
> said
> > > > that,
> > > > >> I could see having a wrapper for our bundled parsers that makes
it
> > > > really
> > > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> > etc.
> > > > That
> > > > >> would give us the best of both worlds, where it's easy to get
> setup
> > > our
> > > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> > What
> > > > >> doing this gives us is an easy way to support (hopefully) every
> > > parser
> > > > that
> > > > >> gets made, right out of the box, without us needing to build a
> > > > specialized
> > > > >> version of everything until we decide to and without users having
> to
> > > > jump
> > > > >> through hoops.
> > > > >>
> > > > >> None of this prevents anyone from creating specialized parsers
> (for
> > > > perf
> > > > >> reasons, or to use the schema registries, or anything else). It's
> > > > probably
> > > > >> worthwhile to package up some of built-in parsers and customize
> them
> > > > to use
> > > > >> more specialized feature appropriately as we see things get used
> in
> > > the
> > > > >> wild. Like you said, we could likely provide Avro schemas for
some
> > of
> > > > this
> > > > >> and give users a more robust experience on what we choose to
> support
> > > > and
> > > > >> provide guidance for other things. I'm also worried that building
> > > > >> specialized schemas becomes problematic for things like parser
> > > chaining
> > > > >> (where our routers wrap the underlying messages and add on their
> own
> > > > info).
> > > > >> Going down that road potentially requires anything wrapped to
> have a
> > > > >> specialized schema for the wrapped version in addition to a
> vanilla
> > > > version
> > > > >> (although please correct me if I'm missing something there, I'll
> > > openly
> > > > >> admit to some shakiness on how that would be handled).
> > > > >>
> > > > >> I also disagree that this is un-Nifi-like, although I'm
admittedly
> > > not
> > > > as
> > > > >> skilled there. The basis for doing this is directly inspired by
> the
> > > > >> JoltTransformer, which is extremely similar to the proposed setup
> > for
> > > > our
> > > > >> parsers: Simply take a spec (in this case the configs, including
> the
> > > > >> fieldTransformations), and delegate a mapping from bytes[] to
> JSON.
> > > The
> > > > >> Jolt library even has an Expression Language (check out
> > > > >>
> > > >
> > >
> > https://community.hortonworks.com/articles/105965/
> expression-language-with-jolt-in-apache-nifi.html
> > > > ),
> > > > >> so it's not a foreign concept. I believe Simon Ball has already
> done
> > > > some
> > > > >> experimenting around with getting Stellar running in NiFi, and
I'd
> > > > love to
> > > > >> see Stellar more readily available in NiFi in general.
> > > > >>
> > > > >> Re: the ControllerService, I see this as a way to maintain
> Metron's
> > > > use of
> > > > >> ZK as the source of config truth. Users could definitely be using
> > > NiFi
> > > > and
> > > > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > > > >> example). Using the ControllerService gives us a ZK instance as
> the
> > > > single
> > > > >> source of truth. That way we aren't forcing users to go to two
> > > > different
> > > > >> places to manage configs. This also lets us leverage our existing
> > > > scripts
> > > > >> and our existing infrastructure around configs and their
> management
> > > and
> > > > >> validation very easily. It also gives users a way to port from
> NiFi
> > > to
> > > > >> Storm or vice-versa without having to migrate configs as well. We
> > > could
> > > > >> also provide the option to configure the Processor itself with
the
> > > data
> > > > >> (just don't set up a controller service and provide the json or
> > > > whatever as
> > > > >> one of our properties).
> > > > >>
> > > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> I think this is a good idea. As I mentioned in the other thread
> > I’ve
> > > > >>> been doing a lot of work on Nifi recently.
> > > > >>> I think the important thing is that what is done should be done
> the
> > > > NiFi
> > > > >>> way, not bolting the Metron composition
> > > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > > components
> > > > >>> should be single purpose and simple, allowing
> > > > >>> exceptional flexibility in composition.
> > > > >>>
> > > > >>> Comments inline.
> > > > >>>
> > > > >>> On August 7, 2018 at 09:27:01, Justin Leet (
> justinjleet@gmail.com)
> > > > wrote:
> > > > >>>
> > > > >>> Hi all,
> > > > >>>
> > > > >>> There's interest in being able to run Metron parsers in NiFi,
> > rather
> > > > than
> > > > >>>
> > > > >>> inside Storm. I dug into this a bit, and have some thoughts on
> how
> > > we
> > > > >>> could
> > > > >>> go about this. I'd love feedback on this, along with anything
> we'd
> > > > >>> consider must haves as well as future enhancements.
> > > > >>>
> > > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > > metron-storm
> > > > >>> and create metron-parsers-nifi. For this code to be reusable
> across
> > > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> > need
> > > > to
> > > > >>> decouple our parsers and Storm.
> > > > >>>
> > > > >>> +1. The “parsing code” should be a library that implements an
> > > > interface
> > > > >>> ( another library ).
> > > > >>>
> > > > >>> The Processors and the Storm things can share them.
> > > > >>>
> > > > >>> - There's also some nice fringe benefits around refactoring our
> > code
> > > > >>> to be substantially more clear and understandable; something
> > > > >>> which came up
> > > > >>> while allowing for parser aggregation.
> > > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > > >>> and is meant
> > > > >>> to be used by ControllerServices, which are then used by
> > Processors.
> > > > >>> There's friction involved there in terms of schemas, but also in
> > > > terms of
> > > > >>>
> > > > >>> access to ZK configs and things like parser chaining. We might
> > > > >>> be able to
> > > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > > >>> without getting
> > > > >>> the schema and other benefits.
> > > > >>>
> > > > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> > > json
> > > > ).
> > > > >>>
> > > > >>> All the remaining processors DO have schemas that we know about.
> We
> > > > can
> > > > >>> just provide the avro schemas the same way we provide the ES
> > > schemas.
> > > > >>>
> > > > >>> The “parsing” should not be conflated with the transform/stellar
> in
> > > > >>> NiFi. We should make that separate. Running Stellar over Records
> > > > would be
> > > > >>> the best thing.
> > > > >>>
> > > > >>> - This Processor would work similarly to Storm: bytes[] in ->
> JSON
> > > > >>> out.
> > > > >>> - There is a Processor
> > > > >>> <
> > > > >>>
> > > >
> > >
> > https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-standard-bundle/nifi-standard-processors/src/
> main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > > >>> >
> > > > >>> that
> > > > >>> handles loading other JARs that we can model a
> > > > >>> MetronParserProcessor off of
> > > > >>> that handles classpath/classloader issues (basically just sets
> up a
> > > > >>> classloader specific to what's being loaded and swaps out the
> > > Thread's
> > > > >>> loader when it calls to outside resources).
> > > > >>>
> > > > >>> There should be no reason to load modules outside the NAR. Why
do
> > > you
> > > > >>> expect to? If each Metron Processor equiv of a Metron Storm
> Parser
> > > is
> > > > just
> > > > >>> parsing to json it shouldn’t need much.And we could package them
> in
> > > > the
> > > > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > > > >>> specialization. It should all be in the nar.
> > > > >>>
> > > > >>> The Stellar Processor, if you would support the works would
> > possibly
> > > > need
> > > > >>> this.
> > > > >>>
> > > > >>> 3. Create a MetronZkControllerService to supply our configs to
> our
> > > > >>> processors.
> > > > >>> - This is a pretty established NiFi pattern for being able to
> > > provide
> > > > >>> access to other services needed by a Processor (e.g. databases
or
> > > > large
> > > > >>> configurations files).
> > > > >>> - The same controller service can be used by all Processors to
> > > manage
> > > > >>> configs in a consistent manner.
> > > > >>>
> > > > >>> I think controller services would make sense where needed, I’m
> just
> > > > not
> > > > >>> sure what you imagine them being needed for?
> > > > >>>
> > > > >>> If the user has NiFi, and a Registry etc, are you saying you
> > imagine
> > > > them
> > > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > > storm
> > > > >>> processors and Nifi Processors?
> > > > >>>
> > > > >>> At that point, we can just NAR our controller service and parser
> > > > processor
> > > > >>>
> > > > >>> up as needed, deploy them to NiFi, and let the user provide a
> > config
> > > > for
> > > > >>> where their custom parsers can be provided (i.e. their parser
> jar).
> > > > This
> > > > >>> would be 3 nars (processor, controller-service, and
> > > > controller-service-api
> > > > >>>
> > > > >>> in order to bind the other two together).
> > > > >>>
> > > > >>> Once deployed, our ability to use parsers should fit well into
> the
> > > > >>> standard
> > > > >>> NiFi workflow:
> > > > >>>
> > > > >>> 1. Create a MetronZkControllerService.
> > > > >>> 2. Configure the service to point at zookeeper.
> > > > >>> 3. Create a MetronParser.
> > > > >>> 4. Configure it to use the controller service + parser jar
> location
> > > +
> > > > >>> any other needed configs.
> > > > >>> 5. Use the outputs as needed downstream (either writing out to
> > Kafka
> > > > or
> > > > >>> feeding into more MetronParsers, etc.)
> > > > >>>
> > > > >>> Chaining parsers should ideally become a matter of chaining
> > > > MetronParsers
> > > > >>>
> > > > >>> (and making sure the enveloping configs carry through properly).
> > For
> > > > >>> parser
> > > > >>> aggregation, I'd just avoid it entirely until we know it's
needed
> > in
> > > > NiFi.
> > > > >>>
> > > > >>> Justin
> > > >
> > > > -------------------
> > > > Thank you,
> > > >
> > > > James Sirota
> > > > PMC- Apache Metron
> > > > jsirota AT apache DOT org
> > > >
> > > >
> > >
> >
> >
>



--
--
simon elliston ball
@sireb

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

If we can do the record readers ourselves ( with the parsers inside them )
we can handle the returns.
I’ll be doing the net flow 5 readers once the net flow 5 processor PR ( not
mine ) is in.

I don’t think having a generic class loading parsers foo and having to
manage all that is preferable to having
an archetype and explicit parsers.

Nifi processors and readers are self documenting, and this approach will
make that not possible, as another consideration.



On August 13, 2018 at 06:50:09, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Maybe the edge use case will clarify the config issue a little. The reason
I would want to be able to push Metron parsers into NiFi would be so I can
pre-parse and filter on the edge to save bandwidth from remote locations. I
would expect to be able to parse at the edge and use NiFi to prioritise or
filter on the Metron ready data, then push through to a 'NoOp' parser in
Metron. For this to happen, we would absolutely not want to connect to
Zookeeper, so I'm +1 on Otto's suggestion that the config be embeddable in
NiFi properties. We cannot assume ZK connectivity from NiFi.

I can also see a scenario where NiFi might make it easier to chain parsers,
which is where it overlaps more with Metron. This is more about the fact
that NiFi make it a lot easier to configure and manage complex multi-step
flows than Metron, and is way more user intuitive from a design and
monitoring perspective. My main concern around using NiFi in this way is
about the load on the content repository. We are looking at a lot of
content level transformation here. You could argue that the same load is
taken off Kafka in the chaining scenario, but there is still a chance for a
user to accidentally create a lot of disk access if they go over the top
with NiFi.

I see this as potentially a a chance to make the Metron Parser interface
compatible with NiFi Record Readers. Then both communities could benefit
from sharing each other's parsers.

In terms of the NAR approach, I would say we have a base bundle of the NiFi
bits (https://github.com/simonellistonball/metron/tree/nifi already has
this for stellar, enrichments and an opinionated publisher, it also has a
readme with some discussion around this
https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle).
We can then use other nar dependencies to side load parser classes into the
record reader. We would then need to do some fancy property validation in
NiFi to ensure the classes were available.

Also, Record Readers are much much faster. The only problem I've found with
them is that they error on blank output, which was a problem for me writing
a netflow 9 reader (template only records need to live in NiFi cache, but
not be emitted).

In terms of the schema objection, I'm not sure why schema focus is a
problem. Our parsers have implicit schema and the output schema formats
used in NiFi are very flexible and could be "just a map". That said, we
could also take the opportunity to introduce a method to the parser
interface to emit traits to contribute the bits of schema that a parser
produces. This would ultimately lead to us being able to generate output
schemas (ES, Solr, Hive, whatever which would take a lot of the pain out of
setup for sensors).

Simon

On 9 August 2018 at 16:42, Otto Fowler <ot...@gmail.com> wrote:

> I would say that
>
> - For each configuration parameter we want to pull in, it should be
> explicitly configured through a property as well as through a controller
> service that accesses the metron zk
> - Transformations should not be conflated with parsing in those
processors
> or readers
>
> There is no on the fly configuration change in nifi ( You can’t change
> properties once started ).
>
> Wouldn’t the simplest minimal start be to say that we expect either nifi
or
> metron and simplify things? Let nifi nifi, let metron metron.
>
>
> On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com) wrote:
>
> That's definitely good info, thanks for reaching out to them about it.
>
> In terms of exposing/sharing, I don't think we have to couple them
tightly
> (in fact, I think we should loosen the coupling as much as possible
without
> forcing reimplementation of things). I think there's definitely a way to
do
> that terms of the general purpose processor I proposed (or in terms of
> RecordReader or another implementation).
>
> It would definitely be easy enough to configure it to either pull from ZK
> or to use a parser config json extract as a parameter (to maintain the
same
> formatting and make migration easy). And we can still build specific
> NiFi-oriented parsers as needed (that manage things like Schema via the
> registry and other Nifi mechanisms). This keeps parsers entirely
decoupled
> from a metron installation.
>
> Alternatively, we extract our config handling to a module and scripts we
> can package up and easily deploy configs against ZK (or the maybe Nifi's
> StateController's or whatever). We definitely shouldn't need absolutely
> everything installed to be able to run just parsers on Nifi.
>
> Having said that, right now the easiest way we have to maintain on the
fly
> updatable configs (and updatable is important!) is via ZK. Params in Nifi
> aren't quite that flexible, to the best of my knowledge (i.e. you have to
> stop, update config and restart). We might be able to exploit the
> StateController to manage this for us, but I'm honestly not familiar
enough
> with it and for deployments split between NiFi and Storm, it means
> configuration gets managed in a couple different ways (which may with
users
> since there is a fairly brightline delineation which makes it easier to
> accept). There some complicated configs like fieldTransforms, which is
> part of why I would like things to be configured in the same format (if
not
> the same mechanism).
>
> Ideally, in my mind, the parsers shared between both NiFi and Storm just
> implement the very general MessageParser interface (which is pretty
> minimal, a couple setup methods, validation, and the actual parse). This
> is pretty lightweight and the split of metron-parsers into
> metron-parsers-common et al. would loosen the coupling between parsers
and
> the rest of metron into that core needed to support that.
>
> IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
> config management) that lets us run our set of parsers, lets users build
> new parsers (and don't block specialized NiFi implementations that
exploit
> NiFi's feature set), and lets us get things configured in a relatively
> consistent manner, without losing features, and hopefully requiring a
> pretty minimal slice of Metron to be useful.
>
> On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com>
> wrote:
>
> > I think the benefits are clear. What is unclear is if the goal is to
> > expose or share or re-use Metron capabilities ( stellar, parsing ) in
> nifi
> > in a way that is native to nifi ( configured and managed in nifi ),
where
> > you may not even need metron ( say you just want to parse asa ) or if
the
> > goal is to have a hybrid approach coupling the processors/readers to
the
> > metron installation.
> >
> >
> > On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com)
> wrote:
> >
> > I'll add onto Mike's discussion with the original set of requirements I
> had
> > in mind (and apply feedback on these as necessary!). This is largely
> > overlap with what Mike said, but I want to make sure it's clear where
my
> > proposal was coming from, so we can improve on it as needed. James and
> > Mike are also right, I think I skipped over the benefits of NiFi in
> general
> > a bit, so thanks for chiming in there.
> >
> > - Deploy our bundled parsers without needing custom wrapping on all of
> > them.
> > - Don't prevent ourselves from building custom wrapping as needed.
> > - Custom Java parsers with an easy way to hook in, similar to what we
> > already do in Storm.
> > - One stop (or at least one format) configuration, for the case when
> we're
> > doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> > indexing). I don't think it'll always be "start in NiFi, end in Storm",
> > especially as we build out Stellar capability, but I also don't want
> users
> > learning a different set of configs and config tools for every platform
> we
> > run on.
> > - Ability to build out parsers and other systems fairly easily, e.g.
> Spark.
> > - Support our current use cases (in particular parser chaining as a
more
> > advanced use case).
> >
> > It really boils down to providing a relatively simple user path to be
> able
> > to migrate to NiFi as needed or desired as simply as possible in a very
> > general way, while not preventing parser by parser enhancements.
> >
> > On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > I think it also provides customers greater control over their
> > architecture
> > > by giving them the flexibility to choose where/how to host their
> parsers.
> > >
> > > To Justin's point about the API, my biggest concern about the
> > RecordReader
> > > approach is that it is not stable. We already have a similar problem
in
> > > having the TransportClient in ElasticSearch - they are prone to
> changing
> > it
> > > in minor versions with the advent of their newer REST API, which is
> > > problematic for ensuring a stable installation.
> > >
> > > From my own perspective, our goal with NiFi, at least in part, should
> be
> > > the ability to deploy our core parsing infrastructure, i.e.
> > >
> > > - pre-built parsers
> > > - custom java parsers
> > > - Stellar transforms
> > > - custom stellar transforms
> > >
> > > And have the ability to configure it similarly to how we configure
> > parsers
> > > within Storm. Consistent with our recent parser chaining and
> aggregation
> > > feature, users should be able to construct and deploy similar
> constructs
> > in
> > > NiFi. The core architectural shift would be that parser code should
be
> > > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > > Streaming?, other> and platform architects and devops teams can
choose
> > how
> > > and where to deploy.
> > >
> > > Best,
> > > Mike
> > >
> > >
> > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org>
> wrote:
> > >
> > > > Integration with NiFi would be useful for parsing low-volume
> > telemetries
> > > > at the edge. This is a much more resource friendly way to do it
than
> > > > setting up dedicated storm topologies. The integration would be
that
> > the
> > > > NiFi processor parses the data and pushes it straight into the
> > enrichment
> > > > topic, saving us the resources of having multiple parsers in storm
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > > > Why do we start over. We are going back and forth on
> implementation,
> > > and
> > > > I
> > > > > don’t think we have the same goals or concerns.
> > > > >
> > > > > What would be the requirements or goals of metron integration
with
> > > Nifi?
> > > > > How many levels or options for integration do we have?
> > > > > What are the approaches to choose from?
> > > > > Who are the target users?
> > > > >
> > > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)

> > > > wrote:
> > > > >
> > > > > So how does the MetronRecordReader roll into everything? It seems
> > like
> > > > it'd
> > > > > be more useful on the reader per format approach, but otherwise
it
> > > > doesn't
> > > > > really seem like we gain much, and it requires getting everything
> > > linked
> > > > up
> > > > > properly to be used. Assuming we looked at doing it that way, is
> the
> > > idea
> > > > > that we'd setup a ControllerService with the MetronRecordReader
> and a
> > > > > MetronRecordWriter and then have the StellarTransformRecord
> processor
> > > > > configured with those ControllerServices? How do we manage the
> > > > > configurations of the everything that way? How does the
> > > ControllerService
> > > > > get configured with whatever parser(s) are needed in the flow?
> > > Basically,
> > > > > what's your vision for how everything would tie together?
> > > > >
> > > > > I also forgot to mention this in the original writeup, but
there's
> > > > another
> > > > > reason to avoid the RecordReader: It's not considered stable. See
> > > > >
> > > >
> > >
> > https://github.com/apache/nifi/blob/master/nifi-commons/
> nifi-record/src/main/java/org/apache/nifi/serialization/
> RecordReader.java#L34
> > > > .
> > > > > That alone makes me super hesitant to use it, if it can shift out
> > from
> > > > > under us in even in incremental version.
> > > > >
> > > > > I'm also unclear on why StellarTransformRecord processor matters
> for
> > > > either
> > > > > approach. With the Processor approach you could simply follow it
up
> > > with
> > > > > the Stellar processor, the same way you'd would in the
RecordReader
> > > > > approach. The Stellar processor should be a parallel improvement,
> > not a
> > > > > conflicting one.
> > > > >
> > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > >> A Metron Processor itself isn’t really necessary. A
> > > MetronRecordReader
> > > > (
> > > > >> either the megalithic or a reader per format ) would be a good
> > > > approach.
> > > > >> Then have StellarTransformRecord processor that can do Stellar
on
> > > _any_
> > > > >> record, regardless of source.
> > > > >>
> > > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com
> )
> > > > wrote:
> > > > >>
> > > > >> Thanks for the comments, Otto, this is definitely great
feedback.
> > I'd
> > > > >> love to respond inline, but the email's already starting to lose
> > it's
> > > > >> formatting, so I'll go with the classic "wall of text". Let me
> know
> > > if
> > > > I
> > > > >> didn't address everything.
> > > > >>
> > > > >> Loading modules (or jars or whatever) outside of our Processor
> gives
> > > us
> > > > >> the benefit of making it incredibly easy for a users to create
> their
> > > > own
> > > > >> parsers. I would definitely expect our own bundled parsers to be
> > > > included
> > > > >> in our base NAR, but loading modules enables users to only have
to
> > > > learn
> > > > >> how Metron wants our stuff lined up and just plug it in. Having
> said
> > > > that,
> > > > >> I could see having a wrapper for our bundled parsers that makes
it
> > > > really
> > > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> > etc.
> > > > That
> > > > >> would give us the best of both worlds, where it's easy to get
> setup
> > > our
> > > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> > What
> > > > >> doing this gives us is an easy way to support (hopefully) every
> > > parser
> > > > that
> > > > >> gets made, right out of the box, without us needing to build a
> > > > specialized
> > > > >> version of everything until we decide to and without users
having
> to
> > > > jump
> > > > >> through hoops.
> > > > >>
> > > > >> None of this prevents anyone from creating specialized parsers
> (for
> > > > perf
> > > > >> reasons, or to use the schema registries, or anything else).
It's
> > > > probably
> > > > >> worthwhile to package up some of built-in parsers and customize
> them
> > > > to use
> > > > >> more specialized feature appropriately as we see things get used
> in
> > > the
> > > > >> wild. Like you said, we could likely provide Avro schemas for
some
> > of
> > > > this
> > > > >> and give users a more robust experience on what we choose to
> support
> > > > and
> > > > >> provide guidance for other things. I'm also worried that
building
> > > > >> specialized schemas becomes problematic for things like parser
> > > chaining
> > > > >> (where our routers wrap the underlying messages and add on their
> own
> > > > info).
> > > > >> Going down that road potentially requires anything wrapped to
> have a
> > > > >> specialized schema for the wrapped version in addition to a
> vanilla
> > > > version
> > > > >> (although please correct me if I'm missing something there, I'll
> > > openly
> > > > >> admit to some shakiness on how that would be handled).
> > > > >>
> > > > >> I also disagree that this is un-Nifi-like, although I'm
admittedly
> > > not
> > > > as
> > > > >> skilled there. The basis for doing this is directly inspired by
> the
> > > > >> JoltTransformer, which is extremely similar to the proposed
setup
> > for
> > > > our
> > > > >> parsers: Simply take a spec (in this case the configs, including
> the
> > > > >> fieldTransformations), and delegate a mapping from bytes[] to
> JSON.
> > > The
> > > > >> Jolt library even has an Expression Language (check out
> > > > >>
> > > >
> > >
> > https://community.hortonworks.com/articles/105965/
> expression-language-with-jolt-in-apache-nifi.html
> > > > ),
> > > > >> so it's not a foreign concept. I believe Simon Ball has already
> done
> > > > some
> > > > >> experimenting around with getting Stellar running in NiFi, and
I'd
> > > > love to
> > > > >> see Stellar more readily available in NiFi in general.
> > > > >>
> > > > >> Re: the ControllerService, I see this as a way to maintain
> Metron's
> > > > use of
> > > > >> ZK as the source of config truth. Users could definitely be
using
> > > NiFi
> > > > and
> > > > >> Storm in tandem (parse in NiFi + enrich and index from Storm,
for
> > > > >> example). Using the ControllerService gives us a ZK instance as
> the
> > > > single
> > > > >> source of truth. That way we aren't forcing users to go to two
> > > > different
> > > > >> places to manage configs. This also lets us leverage our
existing
> > > > scripts
> > > > >> and our existing infrastructure around configs and their
> management
> > > and
> > > > >> validation very easily. It also gives users a way to port from
> NiFi
> > > to
> > > > >> Storm or vice-versa without having to migrate configs as well.
We
> > > could
> > > > >> also provide the option to configure the Processor itself with
the
> > > data
> > > > >> (just don't set up a controller service and provide the json or
> > > > whatever as
> > > > >> one of our properties).
> > > > >>
> > > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> I think this is a good idea. As I mentioned in the other thread
> > I’ve
> > > > >>> been doing a lot of work on Nifi recently.
> > > > >>> I think the important thing is that what is done should be done
> the
> > > > NiFi
> > > > >>> way, not bolting the Metron composition
> > > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > > components
> > > > >>> should be single purpose and simple, allowing
> > > > >>> exceptional flexibility in composition.
> > > > >>>
> > > > >>> Comments inline.
> > > > >>>
> > > > >>> On August 7, 2018 at 09:27:01, Justin Leet (
> justinjleet@gmail.com)
> > > > wrote:
> > > > >>>
> > > > >>> Hi all,
> > > > >>>
> > > > >>> There's interest in being able to run Metron parsers in NiFi,
> > rather
> > > > than
> > > > >>>
> > > > >>> inside Storm. I dug into this a bit, and have some thoughts on
> how
> > > we
> > > > >>> could
> > > > >>> go about this. I'd love feedback on this, along with anything
> we'd
> > > > >>> consider must haves as well as future enhancements.
> > > > >>>
> > > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > > metron-storm
> > > > >>> and create metron-parsers-nifi. For this code to be reusable
> across
> > > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> > need
> > > > to
> > > > >>> decouple our parsers and Storm.
> > > > >>>
> > > > >>> +1. The “parsing code” should be a library that implements an
> > > > interface
> > > > >>> ( another library ).
> > > > >>>
> > > > >>> The Processors and the Storm things can share them.
> > > > >>>
> > > > >>> - There's also some nice fringe benefits around refactoring our
> > code
> > > > >>> to be substantially more clear and understandable; something
> > > > >>> which came up
> > > > >>> while allowing for parser aggregation.
> > > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > > >>> and is meant
> > > > >>> to be used by ControllerServices, which are then used by
> > Processors.
> > > > >>> There's friction involved there in terms of schemas, but also
in
> > > > terms of
> > > > >>>
> > > > >>> access to ZK configs and things like parser chaining. We might
> > > > >>> be able to
> > > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > > >>> without getting
> > > > >>> the schema and other benefits.
> > > > >>>
> > > > >>> We won’t have to provide our ‘no schema processors’ ( grok,
csv,
> > > json
> > > > ).
> > > > >>>
> > > > >>> All the remaining processors DO have schemas that we know
about.
> We
> > > > can
> > > > >>> just provide the avro schemas the same way we provide the ES
> > > schemas.
> > > > >>>
> > > > >>> The “parsing” should not be conflated with the
transform/stellar
> in
> > > > >>> NiFi. We should make that separate. Running Stellar over
Records
> > > > would be
> > > > >>> the best thing.
> > > > >>>
> > > > >>> - This Processor would work similarly to Storm: bytes[] in ->
> JSON
> > > > >>> out.
> > > > >>> - There is a Processor
> > > > >>> <
> > > > >>>
> > > >
> > >
> > https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-standard-bundle/nifi-standard-processors/src/
> main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > > >>> >
> > > > >>> that
> > > > >>> handles loading other JARs that we can model a
> > > > >>> MetronParserProcessor off of
> > > > >>> that handles classpath/classloader issues (basically just sets
> up a
> > > > >>> classloader specific to what's being loaded and swaps out the
> > > Thread's
> > > > >>> loader when it calls to outside resources).
> > > > >>>
> > > > >>> There should be no reason to load modules outside the NAR. Why
do
> > > you
> > > > >>> expect to? If each Metron Processor equiv of a Metron Storm
> Parser
> > > is
> > > > just
> > > > >>> parsing to json it shouldn’t need much.And we could package
them
> in
> > > > the
> > > > >>> NAR. I would suggest we have a Processor per Parser to allow
for
> > > > >>> specialization. It should all be in the nar.
> > > > >>>
> > > > >>> The Stellar Processor, if you would support the works would
> > possibly
> > > > need
> > > > >>> this.
> > > > >>>
> > > > >>> 3. Create a MetronZkControllerService to supply our configs to
> our
> > > > >>> processors.
> > > > >>> - This is a pretty established NiFi pattern for being able to
> > > provide
> > > > >>> access to other services needed by a Processor (e.g. databases
or
> > > > large
> > > > >>> configurations files).
> > > > >>> - The same controller service can be used by all Processors to
> > > manage
> > > > >>> configs in a consistent manner.
> > > > >>>
> > > > >>> I think controller services would make sense where needed, I’m
> just
> > > > not
> > > > >>> sure what you imagine them being needed for?
> > > > >>>
> > > > >>> If the user has NiFi, and a Registry etc, are you saying you
> > imagine
> > > > them
> > > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > > storm
> > > > >>> processors and Nifi Processors?
> > > > >>>
> > > > >>> At that point, we can just NAR our controller service and
parser
> > > > processor
> > > > >>>
> > > > >>> up as needed, deploy them to NiFi, and let the user provide a
> > config
> > > > for
> > > > >>> where their custom parsers can be provided (i.e. their parser
> jar).
> > > > This
> > > > >>> would be 3 nars (processor, controller-service, and
> > > > controller-service-api
> > > > >>>
> > > > >>> in order to bind the other two together).
> > > > >>>
> > > > >>> Once deployed, our ability to use parsers should fit well into
> the
> > > > >>> standard
> > > > >>> NiFi workflow:
> > > > >>>
> > > > >>> 1. Create a MetronZkControllerService.
> > > > >>> 2. Configure the service to point at zookeeper.
> > > > >>> 3. Create a MetronParser.
> > > > >>> 4. Configure it to use the controller service + parser jar
> location
> > > +
> > > > >>> any other needed configs.
> > > > >>> 5. Use the outputs as needed downstream (either writing out to
> > Kafka
> > > > or
> > > > >>> feeding into more MetronParsers, etc.)
> > > > >>>
> > > > >>> Chaining parsers should ideally become a matter of chaining
> > > > MetronParsers
> > > > >>>
> > > > >>> (and making sure the enveloping configs carry through
properly).
> > For
> > > > >>> parser
> > > > >>> aggregation, I'd just avoid it entirely until we know it's
needed
> > in
> > > > NiFi.
> > > > >>>
> > > > >>> Justin
> > > >
> > > > -------------------
> > > > Thank you,
> > > >
> > > > James Sirota
> > > > PMC- Apache Metron
> > > > jsirota AT apache DOT org
> > > >
> > > >
> > >
> >
> >
>



-- 
-- 
simon elliston ball
@sireb

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Simon Elliston Ball <si...@simonellistonball.com>.

Maybe the edge use case will clarify the config issue a little. The reason
I would want to be able to push Metron parsers into NiFi would be so I can
pre-parse and filter on the edge to save bandwidth from remote locations. I
would expect to be able to parse at the edge and use NiFi to prioritise or
filter on the Metron ready data, then push through to a 'NoOp' parser in
Metron. For this to happen, we would absolutely not want to connect to
Zookeeper, so I'm +1 on Otto's suggestion that the config be embeddable in
NiFi properties. We cannot assume ZK connectivity from NiFi.

I can also see a scenario where NiFi might make it easier to chain parsers,
which is where it overlaps more with Metron. This is more about the fact
that NiFi make it a lot easier to configure and manage complex multi-step
flows than Metron, and is way more user intuitive from a design and
monitoring perspective. My main concern around using NiFi in this way is
about the load on the content repository. We are looking at a lot of
content level transformation here. You could argue that the same load is
taken off Kafka in the chaining scenario, but there is still a chance for a
user to accidentally create a lot of disk access if they go over the top
with NiFi.

I see this as potentially a a chance to make the Metron Parser interface
compatible with NiFi Record Readers. Then both communities could benefit
from sharing each other's parsers.

In terms of the NAR approach, I would say we have a base bundle of the NiFi
bits (https://github.com/simonellistonball/metron/tree/nifi already has
this for stellar, enrichments and an opinionated publisher, it also has a
readme with some discussion around this
https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle).
We can then use other nar dependencies to side load parser classes into the
record reader. We would then need to do some fancy property validation in
NiFi to ensure the classes were available.

Also, Record Readers are much much faster. The only problem I've found with
them is that they error on blank output, which was a problem for me writing
a netflow 9 reader (template only records need to live in NiFi cache, but
not be emitted).

In terms of the schema objection, I'm not sure why schema focus is a
problem. Our parsers have implicit schema and the output schema formats
used in NiFi are very flexible and could be "just a map". That said, we
could also take the opportunity to introduce a method to the parser
interface to emit traits to contribute the bits of schema that a parser
produces. This would ultimately lead to us being able to generate output
schemas (ES, Solr, Hive, whatever which would take a lot of the pain out of
setup for sensors).

Simon

On 9 August 2018 at 16:42, Otto Fowler <ot...@gmail.com> wrote:

> I would say that
>
> - For each configuration parameter we want to pull in, it should be
> explicitly configured through a property as well as through a controller
> service that accesses the metron zk
> - Transformations should not be conflated with parsing in those processors
> or readers
>
> There is no on the fly configuration change in nifi ( You can’t change
> properties once started ).
>
> Wouldn’t the simplest minimal start be to say that we expect either nifi or
> metron and simplify things?  Let nifi nifi, let metron metron.
>
>
> On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com) wrote:
>
> That's definitely good info, thanks for reaching out to them about it.
>
> In terms of exposing/sharing, I don't think we have to couple them tightly
> (in fact, I think we should loosen the coupling as much as possible without
> forcing reimplementation of things). I think there's definitely a way to do
> that terms of the general purpose processor I proposed (or in terms of
> RecordReader or another implementation).
>
> It would definitely be easy enough to configure it to either pull from ZK
> or to use a parser config json extract as a parameter (to maintain the same
> formatting and make migration easy).  And we can still build specific
> NiFi-oriented parsers as needed (that manage things like Schema via the
> registry and other Nifi mechanisms).  This keeps parsers entirely decoupled
> from a metron installation.
>
> Alternatively, we extract our config handling to a module and scripts we
> can package up and easily deploy configs against ZK (or the maybe Nifi's
> StateController's or whatever).  We definitely shouldn't need absolutely
> everything installed to be able to run just parsers on Nifi.
>
> Having said that, right now the easiest way we have to maintain on the fly
> updatable configs (and updatable is important!) is via ZK.  Params in Nifi
> aren't quite that flexible, to the best of my knowledge (i.e. you have to
> stop, update config and restart). We might be able to exploit the
> StateController to manage this for us, but I'm honestly not familiar enough
> with it and for deployments split between NiFi and Storm, it means
> configuration gets managed in a couple different ways (which may with users
> since there is a fairly brightline delineation which makes it easier to
> accept).  There some complicated configs like fieldTransforms, which is
> part of why I would like things to be configured in the same format (if not
> the same mechanism).
>
> Ideally, in my mind, the parsers shared between both NiFi and Storm just
> implement the very general MessageParser interface (which is pretty
> minimal, a couple setup methods, validation, and the actual parse).  This
> is pretty lightweight and the split of metron-parsers into
> metron-parsers-common et al. would loosen the coupling between parsers and
> the rest of metron into that core needed to support that.
>
> IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
> config management) that lets us run our set of parsers, lets users build
> new parsers (and don't block specialized NiFi implementations that exploit
> NiFi's feature set), and lets us get things configured in a relatively
> consistent manner, without losing features, and hopefully requiring a
> pretty minimal slice of Metron to be useful.
>
> On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com>
> wrote:
>
> > I think the benefits are clear.  What is unclear is if the goal is to
> > expose or share or re-use Metron capabilities ( stellar, parsing ) in
> nifi
> > in a way that is native to nifi ( configured and managed in nifi ), where
> > you may not even need metron ( say you just want to parse asa ) or if the
> > goal is to have a hybrid approach coupling the processors/readers to the
> > metron installation.
> >
> >
> > On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com)
> wrote:
> >
> > I'll add onto Mike's discussion with the original set of requirements I
> had
> > in mind (and apply feedback on these as necessary!). This is largely
> > overlap with what Mike said, but I want to make sure it's clear where my
> > proposal was coming from, so we can improve on it as needed. James and
> > Mike are also right, I think I skipped over the benefits of NiFi in
> general
> > a bit, so thanks for chiming in there.
> >
> > - Deploy our bundled parsers without needing custom wrapping on all of
> > them.
> > - Don't prevent ourselves from building custom wrapping as needed.
> > - Custom Java parsers with an easy way to hook in, similar to what we
> > already do in Storm.
> > - One stop (or at least one format) configuration, for the case when
> we're
> > doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> > indexing). I don't think it'll always be "start in NiFi, end in Storm",
> > especially as we build out Stellar capability, but I also don't want
> users
> > learning a different set of configs and config tools for every platform
> we
> > run on.
> > - Ability to build out parsers and other systems fairly easily, e.g.
> Spark.
> > - Support our current use cases (in particular parser chaining as a more
> > advanced use case).
> >
> > It really boils down to providing a relatively simple user path to be
> able
> > to migrate to NiFi as needed or desired as simply as possible in a very
> > general way, while not preventing parser by parser enhancements.
> >
> > On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > I think it also provides customers greater control over their
> > architecture
> > > by giving them the flexibility to choose where/how to host their
> parsers.
> > >
> > > To Justin's point about the API, my biggest concern about the
> > RecordReader
> > > approach is that it is not stable. We already have a similar problem in
> > > having the TransportClient in ElasticSearch - they are prone to
> changing
> > it
> > > in minor versions with the advent of their newer REST API, which is
> > > problematic for ensuring a stable installation.
> > >
> > > From my own perspective, our goal with NiFi, at least in part, should
> be
> > > the ability to deploy our core parsing infrastructure, i.e.
> > >
> > > - pre-built parsers
> > > - custom java parsers
> > > - Stellar transforms
> > > - custom stellar transforms
> > >
> > > And have the ability to configure it similarly to how we configure
> > parsers
> > > within Storm. Consistent with our recent parser chaining and
> aggregation
> > > feature, users should be able to construct and deploy similar
> constructs
> > in
> > > NiFi. The core architectural shift would be that parser code should be
> > > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > > Streaming?, other> and platform architects and devops teams can choose
> > how
> > > and where to deploy.
> > >
> > > Best,
> > > Mike
> > >
> > >
> > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org>
> wrote:
> > >
> > > > Integration with NiFi would be useful for parsing low-volume
> > telemetries
> > > > at the edge. This is a much more resource friendly way to do it than
> > > > setting up dedicated storm topologies. The integration would be that
> > the
> > > > NiFi processor parses the data and pushes it straight into the
> > enrichment
> > > > topic, saving us the resources of having multiple parsers in storm
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > > > Why do we start over. We are going back and forth on
> implementation,
> > > and
> > > > I
> > > > > don’t think we have the same goals or concerns.
> > > > >
> > > > > What would be the requirements or goals of metron integration with
> > > Nifi?
> > > > > How many levels or options for integration do we have?
> > > > > What are the approaches to choose from?
> > > > > Who are the target users?
> > > > >
> > > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > > > wrote:
> > > > >
> > > > > So how does the MetronRecordReader roll into everything? It seems
> > like
> > > > it'd
> > > > > be more useful on the reader per format approach, but otherwise it
> > > > doesn't
> > > > > really seem like we gain much, and it requires getting everything
> > > linked
> > > > up
> > > > > properly to be used. Assuming we looked at doing it that way, is
> the
> > > idea
> > > > > that we'd setup a ControllerService with the MetronRecordReader
> and a
> > > > > MetronRecordWriter and then have the StellarTransformRecord
> processor
> > > > > configured with those ControllerServices? How do we manage the
> > > > > configurations of the everything that way? How does the
> > > ControllerService
> > > > > get configured with whatever parser(s) are needed in the flow?
> > > Basically,
> > > > > what's your vision for how everything would tie together?
> > > > >
> > > > > I also forgot to mention this in the original writeup, but there's
> > > > another
> > > > > reason to avoid the RecordReader: It's not considered stable. See
> > > > >
> > > >
> > >
> > https://github.com/apache/nifi/blob/master/nifi-commons/
> nifi-record/src/main/java/org/apache/nifi/serialization/
> RecordReader.java#L34
> > > > .
> > > > > That alone makes me super hesitant to use it, if it can shift out
> > from
> > > > > under us in even in incremental version.
> > > > >
> > > > > I'm also unclear on why StellarTransformRecord processor matters
> for
> > > > either
> > > > > approach. With the Processor approach you could simply follow it up
> > > with
> > > > > the Stellar processor, the same way you'd would in the RecordReader
> > > > > approach. The Stellar processor should be a parallel improvement,
> > not a
> > > > > conflicting one.
> > > > >
> > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > >> A Metron Processor itself isn’t really necessary. A
> > > MetronRecordReader
> > > > (
> > > > >> either the megalithic or a reader per format ) would be a good
> > > > approach.
> > > > >> Then have StellarTransformRecord processor that can do Stellar on
> > > _any_
> > > > >> record, regardless of source.
> > > > >>
> > > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com
> )
> > > > wrote:
> > > > >>
> > > > >> Thanks for the comments, Otto, this is definitely great feedback.
> > I'd
> > > > >> love to respond inline, but the email's already starting to lose
> > it's
> > > > >> formatting, so I'll go with the classic "wall of text". Let me
> know
> > > if
> > > > I
> > > > >> didn't address everything.
> > > > >>
> > > > >> Loading modules (or jars or whatever) outside of our Processor
> gives
> > > us
> > > > >> the benefit of making it incredibly easy for a users to create
> their
> > > > own
> > > > >> parsers. I would definitely expect our own bundled parsers to be
> > > > included
> > > > >> in our base NAR, but loading modules enables users to only have to
> > > > learn
> > > > >> how Metron wants our stuff lined up and just plug it in. Having
> said
> > > > that,
> > > > >> I could see having a wrapper for our bundled parsers that makes it
> > > > really
> > > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> > etc.
> > > > That
> > > > >> would give us the best of both worlds, where it's easy to get
> setup
> > > our
> > > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> > What
> > > > >> doing this gives us is an easy way to support (hopefully) every
> > > parser
> > > > that
> > > > >> gets made, right out of the box, without us needing to build a
> > > > specialized
> > > > >> version of everything until we decide to and without users having
> to
> > > > jump
> > > > >> through hoops.
> > > > >>
> > > > >> None of this prevents anyone from creating specialized parsers
> (for
> > > > perf
> > > > >> reasons, or to use the schema registries, or anything else). It's
> > > > probably
> > > > >> worthwhile to package up some of built-in parsers and customize
> them
> > > > to use
> > > > >> more specialized feature appropriately as we see things get used
> in
> > > the
> > > > >> wild. Like you said, we could likely provide Avro schemas for some
> > of
> > > > this
> > > > >> and give users a more robust experience on what we choose to
> support
> > > > and
> > > > >> provide guidance for other things. I'm also worried that building
> > > > >> specialized schemas becomes problematic for things like parser
> > > chaining
> > > > >> (where our routers wrap the underlying messages and add on their
> own
> > > > info).
> > > > >> Going down that road potentially requires anything wrapped to
> have a
> > > > >> specialized schema for the wrapped version in addition to a
> vanilla
> > > > version
> > > > >> (although please correct me if I'm missing something there, I'll
> > > openly
> > > > >> admit to some shakiness on how that would be handled).
> > > > >>
> > > > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> > > not
> > > > as
> > > > >> skilled there. The basis for doing this is directly inspired by
> the
> > > > >> JoltTransformer, which is extremely similar to the proposed setup
> > for
> > > > our
> > > > >> parsers: Simply take a spec (in this case the configs, including
> the
> > > > >> fieldTransformations), and delegate a mapping from bytes[] to
> JSON.
> > > The
> > > > >> Jolt library even has an Expression Language (check out
> > > > >>
> > > >
> > >
> > https://community.hortonworks.com/articles/105965/
> expression-language-with-jolt-in-apache-nifi.html
> > > > ),
> > > > >> so it's not a foreign concept. I believe Simon Ball has already
> done
> > > > some
> > > > >> experimenting around with getting Stellar running in NiFi, and I'd
> > > > love to
> > > > >> see Stellar more readily available in NiFi in general.
> > > > >>
> > > > >> Re: the ControllerService, I see this as a way to maintain
> Metron's
> > > > use of
> > > > >> ZK as the source of config truth. Users could definitely be using
> > > NiFi
> > > > and
> > > > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > > > >> example). Using the ControllerService gives us a ZK instance as
> the
> > > > single
> > > > >> source of truth. That way we aren't forcing users to go to two
> > > > different
> > > > >> places to manage configs. This also lets us leverage our existing
> > > > scripts
> > > > >> and our existing infrastructure around configs and their
> management
> > > and
> > > > >> validation very easily. It also gives users a way to port from
> NiFi
> > > to
> > > > >> Storm or vice-versa without having to migrate configs as well. We
> > > could
> > > > >> also provide the option to configure the Processor itself with the
> > > data
> > > > >> (just don't set up a controller service and provide the json or
> > > > whatever as
> > > > >> one of our properties).
> > > > >>
> > > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> I think this is a good idea. As I mentioned in the other thread
> > I’ve
> > > > >>> been doing a lot of work on Nifi recently.
> > > > >>> I think the important thing is that what is done should be done
> the
> > > > NiFi
> > > > >>> way, not bolting the Metron composition
> > > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > > components
> > > > >>> should be single purpose and simple, allowing
> > > > >>> exceptional flexibility in composition.
> > > > >>>
> > > > >>> Comments inline.
> > > > >>>
> > > > >>> On August 7, 2018 at 09:27:01, Justin Leet (
> justinjleet@gmail.com)
> > > > wrote:
> > > > >>>
> > > > >>> Hi all,
> > > > >>>
> > > > >>> There's interest in being able to run Metron parsers in NiFi,
> > rather
> > > > than
> > > > >>>
> > > > >>> inside Storm. I dug into this a bit, and have some thoughts on
> how
> > > we
> > > > >>> could
> > > > >>> go about this. I'd love feedback on this, along with anything
> we'd
> > > > >>> consider must haves as well as future enhancements.
> > > > >>>
> > > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > > metron-storm
> > > > >>> and create metron-parsers-nifi. For this code to be reusable
> across
> > > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> > need
> > > > to
> > > > >>> decouple our parsers and Storm.
> > > > >>>
> > > > >>> +1. The “parsing code” should be a library that implements an
> > > > interface
> > > > >>> ( another library ).
> > > > >>>
> > > > >>> The Processors and the Storm things can share them.
> > > > >>>
> > > > >>> - There's also some nice fringe benefits around refactoring our
> > code
> > > > >>> to be substantially more clear and understandable; something
> > > > >>> which came up
> > > > >>> while allowing for parser aggregation.
> > > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > > >>> and is meant
> > > > >>> to be used by ControllerServices, which are then used by
> > Processors.
> > > > >>> There's friction involved there in terms of schemas, but also in
> > > > terms of
> > > > >>>
> > > > >>> access to ZK configs and things like parser chaining. We might
> > > > >>> be able to
> > > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > > >>> without getting
> > > > >>> the schema and other benefits.
> > > > >>>
> > > > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> > > json
> > > > ).
> > > > >>>
> > > > >>> All the remaining processors DO have schemas that we know about.
> We
> > > > can
> > > > >>> just provide the avro schemas the same way we provide the ES
> > > schemas.
> > > > >>>
> > > > >>> The “parsing” should not be conflated with the transform/stellar
> in
> > > > >>> NiFi. We should make that separate. Running Stellar over Records
> > > > would be
> > > > >>> the best thing.
> > > > >>>
> > > > >>> - This Processor would work similarly to Storm: bytes[] in ->
> JSON
> > > > >>> out.
> > > > >>> - There is a Processor
> > > > >>> <
> > > > >>>
> > > >
> > >
> > https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-standard-bundle/nifi-standard-processors/src/
> main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > > >>> >
> > > > >>> that
> > > > >>> handles loading other JARs that we can model a
> > > > >>> MetronParserProcessor off of
> > > > >>> that handles classpath/classloader issues (basically just sets
> up a
> > > > >>> classloader specific to what's being loaded and swaps out the
> > > Thread's
> > > > >>> loader when it calls to outside resources).
> > > > >>>
> > > > >>> There should be no reason to load modules outside the NAR. Why do
> > > you
> > > > >>> expect to? If each Metron Processor equiv of a Metron Storm
> Parser
> > > is
> > > > just
> > > > >>> parsing to json it shouldn’t need much.And we could package them
> in
> > > > the
> > > > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > > > >>> specialization. It should all be in the nar.
> > > > >>>
> > > > >>> The Stellar Processor, if you would support the works would
> > possibly
> > > > need
> > > > >>> this.
> > > > >>>
> > > > >>> 3. Create a MetronZkControllerService to supply our configs to
> our
> > > > >>> processors.
> > > > >>> - This is a pretty established NiFi pattern for being able to
> > > provide
> > > > >>> access to other services needed by a Processor (e.g. databases or
> > > > large
> > > > >>> configurations files).
> > > > >>> - The same controller service can be used by all Processors to
> > > manage
> > > > >>> configs in a consistent manner.
> > > > >>>
> > > > >>> I think controller services would make sense where needed, I’m
> just
> > > > not
> > > > >>> sure what you imagine them being needed for?
> > > > >>>
> > > > >>> If the user has NiFi, and a Registry etc, are you saying you
> > imagine
> > > > them
> > > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > > storm
> > > > >>> processors and Nifi Processors?
> > > > >>>
> > > > >>> At that point, we can just NAR our controller service and parser
> > > > processor
> > > > >>>
> > > > >>> up as needed, deploy them to NiFi, and let the user provide a
> > config
> > > > for
> > > > >>> where their custom parsers can be provided (i.e. their parser
> jar).
> > > > This
> > > > >>> would be 3 nars (processor, controller-service, and
> > > > controller-service-api
> > > > >>>
> > > > >>> in order to bind the other two together).
> > > > >>>
> > > > >>> Once deployed, our ability to use parsers should fit well into
> the
> > > > >>> standard
> > > > >>> NiFi workflow:
> > > > >>>
> > > > >>> 1. Create a MetronZkControllerService.
> > > > >>> 2. Configure the service to point at zookeeper.
> > > > >>> 3. Create a MetronParser.
> > > > >>> 4. Configure it to use the controller service + parser jar
> location
> > > +
> > > > >>> any other needed configs.
> > > > >>> 5. Use the outputs as needed downstream (either writing out to
> > Kafka
> > > > or
> > > > >>> feeding into more MetronParsers, etc.)
> > > > >>>
> > > > >>> Chaining parsers should ideally become a matter of chaining
> > > > MetronParsers
> > > > >>>
> > > > >>> (and making sure the enveloping configs carry through properly).
> > For
> > > > >>> parser
> > > > >>> aggregation, I'd just avoid it entirely until we know it's needed
> > in
> > > > NiFi.
> > > > >>>
> > > > >>> Justin
> > > >
> > > > -------------------
> > > > Thank you,
> > > >
> > > > James Sirota
> > > > PMC- Apache Metron
> > > > jsirota AT apache DOT org
> > > >
> > > >
> > >
> >
> >
>



-- 
--
simon elliston ball
@sireb

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

I would say that

- For each configuration parameter we want to pull in, it should be
explicitly configured through a property as well as through a controller
service that accesses the metron zk
- Transformations should not be conflated with parsing in those processors
or readers

There is no on the fly configuration change in nifi ( You can’t change
properties once started ).

Wouldn’t the simplest minimal start be to say that we expect either nifi or
metron and simplify things?  Let nifi nifi, let metron metron.


On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com) wrote:

That's definitely good info, thanks for reaching out to them about it.

In terms of exposing/sharing, I don't think we have to couple them tightly
(in fact, I think we should loosen the coupling as much as possible without
forcing reimplementation of things). I think there's definitely a way to do
that terms of the general purpose processor I proposed (or in terms of
RecordReader or another implementation).

It would definitely be easy enough to configure it to either pull from ZK
or to use a parser config json extract as a parameter (to maintain the same
formatting and make migration easy).  And we can still build specific
NiFi-oriented parsers as needed (that manage things like Schema via the
registry and other Nifi mechanisms).  This keeps parsers entirely decoupled
from a metron installation.

Alternatively, we extract our config handling to a module and scripts we
can package up and easily deploy configs against ZK (or the maybe Nifi's
StateController's or whatever).  We definitely shouldn't need absolutely
everything installed to be able to run just parsers on Nifi.

Having said that, right now the easiest way we have to maintain on the fly
updatable configs (and updatable is important!) is via ZK.  Params in Nifi
aren't quite that flexible, to the best of my knowledge (i.e. you have to
stop, update config and restart). We might be able to exploit the
StateController to manage this for us, but I'm honestly not familiar enough
with it and for deployments split between NiFi and Storm, it means
configuration gets managed in a couple different ways (which may with users
since there is a fairly brightline delineation which makes it easier to
accept).  There some complicated configs like fieldTransforms, which is
part of why I would like things to be configured in the same format (if not
the same mechanism).

Ideally, in my mind, the parsers shared between both NiFi and Storm just
implement the very general MessageParser interface (which is pretty
minimal, a couple setup methods, validation, and the actual parse).  This
is pretty lightweight and the split of metron-parsers into
metron-parsers-common et al. would loosen the coupling between parsers and
the rest of metron into that core needed to support that.

IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
config management) that lets us run our set of parsers, lets users build
new parsers (and don't block specialized NiFi implementations that exploit
NiFi's feature set), and lets us get things configured in a relatively
consistent manner, without losing features, and hopefully requiring a
pretty minimal slice of Metron to be useful.

On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com> wrote:

> I think the benefits are clear.  What is unclear is if the goal is to
> expose or share or re-use Metron capabilities ( stellar, parsing ) in nifi
> in a way that is native to nifi ( configured and managed in nifi ), where
> you may not even need metron ( say you just want to parse asa ) or if the
> goal is to have a hybrid approach coupling the processors/readers to the
> metron installation.
>
>
> On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com) wrote:
>
> I'll add onto Mike's discussion with the original set of requirements I had
> in mind (and apply feedback on these as necessary!). This is largely
> overlap with what Mike said, but I want to make sure it's clear where my
> proposal was coming from, so we can improve on it as needed. James and
> Mike are also right, I think I skipped over the benefits of NiFi in general
> a bit, so thanks for chiming in there.
>
> - Deploy our bundled parsers without needing custom wrapping on all of
> them.
> - Don't prevent ourselves from building custom wrapping as needed.
> - Custom Java parsers with an easy way to hook in, similar to what we
> already do in Storm.
> - One stop (or at least one format) configuration, for the case when we're
> doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> indexing). I don't think it'll always be "start in NiFi, end in Storm",
> especially as we build out Stellar capability, but I also don't want users
> learning a different set of configs and config tools for every platform we
> run on.
> - Ability to build out parsers and other systems fairly easily, e.g. Spark.
> - Support our current use cases (in particular parser chaining as a more
> advanced use case).
>
> It really boils down to providing a relatively simple user path to be able
> to migrate to NiFi as needed or desired as simply as possible in a very
> general way, while not preventing parser by parser enhancements.
>
> On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > I think it also provides customers greater control over their
> architecture
> > by giving them the flexibility to choose where/how to host their parsers.
> >
> > To Justin's point about the API, my biggest concern about the
> RecordReader
> > approach is that it is not stable. We already have a similar problem in
> > having the TransportClient in ElasticSearch - they are prone to changing
> it
> > in minor versions with the advent of their newer REST API, which is
> > problematic for ensuring a stable installation.
> >
> > From my own perspective, our goal with NiFi, at least in part, should be
> > the ability to deploy our core parsing infrastructure, i.e.
> >
> > - pre-built parsers
> > - custom java parsers
> > - Stellar transforms
> > - custom stellar transforms
> >
> > And have the ability to configure it similarly to how we configure
> parsers
> > within Storm. Consistent with our recent parser chaining and aggregation
> > feature, users should be able to construct and deploy similar constructs
> in
> > NiFi. The core architectural shift would be that parser code should be
> > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > Streaming?, other> and platform architects and devops teams can choose
> how
> > and where to deploy.
> >
> > Best,
> > Mike
> >
> >
> > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org> wrote:
> >
> > > Integration with NiFi would be useful for parsing low-volume
> telemetries
> > > at the edge. This is a much more resource friendly way to do it than
> > > setting up dedicated storm topologies. The integration would be that
> the
> > > NiFi processor parses the data and pushes it straight into the
> enrichment
> > > topic, saving us the resources of having multiple parsers in storm
> > >
> > > Thanks,
> > > James
> > >
> > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > > Why do we start over. We are going back and forth on implementation,
> > and
> > > I
> > > > don’t think we have the same goals or concerns.
> > > >
> > > > What would be the requirements or goals of metron integration with
> > Nifi?
> > > > How many levels or options for integration do we have?
> > > > What are the approaches to choose from?
> > > > Who are the target users?
> > > >
> > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >
> > > > So how does the MetronRecordReader roll into everything? It seems
> like
> > > it'd
> > > > be more useful on the reader per format approach, but otherwise it
> > > doesn't
> > > > really seem like we gain much, and it requires getting everything
> > linked
> > > up
> > > > properly to be used. Assuming we looked at doing it that way, is the
> > idea
> > > > that we'd setup a ControllerService with the MetronRecordReader and a
> > > > MetronRecordWriter and then have the StellarTransformRecord processor
> > > > configured with those ControllerServices? How do we manage the
> > > > configurations of the everything that way? How does the
> > ControllerService
> > > > get configured with whatever parser(s) are needed in the flow?
> > Basically,
> > > > what's your vision for how everything would tie together?
> > > >
> > > > I also forgot to mention this in the original writeup, but there's
> > > another
> > > > reason to avoid the RecordReader: It's not considered stable. See
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > > .
> > > > That alone makes me super hesitant to use it, if it can shift out
> from
> > > > under us in even in incremental version.
> > > >
> > > > I'm also unclear on why StellarTransformRecord processor matters for
> > > either
> > > > approach. With the Processor approach you could simply follow it up
> > with
> > > > the Stellar processor, the same way you'd would in the RecordReader
> > > > approach. The Stellar processor should be a parallel improvement,
> not a
> > > > conflicting one.
> > > >
> > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ottobackwards@gmail.com
> >
> > > wrote:
> > > >
> > > >> A Metron Processor itself isn’t really necessary. A
> > MetronRecordReader
> > > (
> > > >> either the megalithic or a reader per format ) would be a good
> > > approach.
> > > >> Then have StellarTransformRecord processor that can do Stellar on
> > _any_
> > > >> record, regardless of source.
> > > >>
> > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >>
> > > >> Thanks for the comments, Otto, this is definitely great feedback.
> I'd
> > > >> love to respond inline, but the email's already starting to lose
> it's
> > > >> formatting, so I'll go with the classic "wall of text". Let me know
> > if
> > > I
> > > >> didn't address everything.
> > > >>
> > > >> Loading modules (or jars or whatever) outside of our Processor gives
> > us
> > > >> the benefit of making it incredibly easy for a users to create their
> > > own
> > > >> parsers. I would definitely expect our own bundled parsers to be
> > > included
> > > >> in our base NAR, but loading modules enables users to only have to
> > > learn
> > > >> how Metron wants our stuff lined up and just plug it in. Having said
> > > that,
> > > >> I could see having a wrapper for our bundled parsers that makes it
> > > really
> > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> etc.
> > > That
> > > >> would give us the best of both worlds, where it's easy to get setup
> > our
> > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> What
> > > >> doing this gives us is an easy way to support (hopefully) every
> > parser
> > > that
> > > >> gets made, right out of the box, without us needing to build a
> > > specialized
> > > >> version of everything until we decide to and without users having to
> > > jump
> > > >> through hoops.
> > > >>
> > > >> None of this prevents anyone from creating specialized parsers (for
> > > perf
> > > >> reasons, or to use the schema registries, or anything else). It's
> > > probably
> > > >> worthwhile to package up some of built-in parsers and customize them
> > > to use
> > > >> more specialized feature appropriately as we see things get used in
> > the
> > > >> wild. Like you said, we could likely provide Avro schemas for some
> of
> > > this
> > > >> and give users a more robust experience on what we choose to support
> > > and
> > > >> provide guidance for other things. I'm also worried that building
> > > >> specialized schemas becomes problematic for things like parser
> > chaining
> > > >> (where our routers wrap the underlying messages and add on their own
> > > info).
> > > >> Going down that road potentially requires anything wrapped to have a
> > > >> specialized schema for the wrapped version in addition to a vanilla
> > > version
> > > >> (although please correct me if I'm missing something there, I'll
> > openly
> > > >> admit to some shakiness on how that would be handled).
> > > >>
> > > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> > not
> > > as
> > > >> skilled there. The basis for doing this is directly inspired by the
> > > >> JoltTransformer, which is extremely similar to the proposed setup
> for
> > > our
> > > >> parsers: Simply take a spec (in this case the configs, including the
> > > >> fieldTransformations), and delegate a mapping from bytes[] to JSON.
> > The
> > > >> Jolt library even has an Expression Language (check out
> > > >>
> > >
> >
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > > ),
> > > >> so it's not a foreign concept. I believe Simon Ball has already done
> > > some
> > > >> experimenting around with getting Stellar running in NiFi, and I'd
> > > love to
> > > >> see Stellar more readily available in NiFi in general.
> > > >>
> > > >> Re: the ControllerService, I see this as a way to maintain Metron's
> > > use of
> > > >> ZK as the source of config truth. Users could definitely be using
> > NiFi
> > > and
> > > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > > >> example). Using the ControllerService gives us a ZK instance as the
> > > single
> > > >> source of truth. That way we aren't forcing users to go to two
> > > different
> > > >> places to manage configs. This also lets us leverage our existing
> > > scripts
> > > >> and our existing infrastructure around configs and their management
> > and
> > > >> validation very easily. It also gives users a way to port from NiFi
> > to
> > > >> Storm or vice-versa without having to migrate configs as well. We
> > could
> > > >> also provide the option to configure the Processor itself with the
> > data
> > > >> (just don't set up a controller service and provide the json or
> > > whatever as
> > > >> one of our properties).
> > > >>
> > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> I think this is a good idea. As I mentioned in the other thread
> I’ve
> > > >>> been doing a lot of work on Nifi recently.
> > > >>> I think the important thing is that what is done should be done the
> > > NiFi
> > > >>> way, not bolting the Metron composition
> > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > components
> > > >>> should be single purpose and simple, allowing
> > > >>> exceptional flexibility in composition.
> > > >>>
> > > >>> Comments inline.
> > > >>>
> > > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >>>
> > > >>> Hi all,
> > > >>>
> > > >>> There's interest in being able to run Metron parsers in NiFi,
> rather
> > > than
> > > >>>
> > > >>> inside Storm. I dug into this a bit, and have some thoughts on how
> > we
> > > >>> could
> > > >>> go about this. I'd love feedback on this, along with anything we'd
> > > >>> consider must haves as well as future enhancements.
> > > >>>
> > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > metron-storm
> > > >>> and create metron-parsers-nifi. For this code to be reusable across
> > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> need
> > > to
> > > >>> decouple our parsers and Storm.
> > > >>>
> > > >>> +1. The “parsing code” should be a library that implements an
> > > interface
> > > >>> ( another library ).
> > > >>>
> > > >>> The Processors and the Storm things can share them.
> > > >>>
> > > >>> - There's also some nice fringe benefits around refactoring our
> code
> > > >>> to be substantially more clear and understandable; something
> > > >>> which came up
> > > >>> while allowing for parser aggregation.
> > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > >>> and is meant
> > > >>> to be used by ControllerServices, which are then used by
> Processors.
> > > >>> There's friction involved there in terms of schemas, but also in
> > > terms of
> > > >>>
> > > >>> access to ZK configs and things like parser chaining. We might
> > > >>> be able to
> > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > >>> without getting
> > > >>> the schema and other benefits.
> > > >>>
> > > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> > json
> > > ).
> > > >>>
> > > >>> All the remaining processors DO have schemas that we know about. We
> > > can
> > > >>> just provide the avro schemas the same way we provide the ES
> > schemas.
> > > >>>
> > > >>> The “parsing” should not be conflated with the transform/stellar in
> > > >>> NiFi. We should make that separate. Running Stellar over Records
> > > would be
> > > >>> the best thing.
> > > >>>
> > > >>> - This Processor would work similarly to Storm: bytes[] in -> JSON
> > > >>> out.
> > > >>> - There is a Processor
> > > >>> <
> > > >>>
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > >>> >
> > > >>> that
> > > >>> handles loading other JARs that we can model a
> > > >>> MetronParserProcessor off of
> > > >>> that handles classpath/classloader issues (basically just sets up a
> > > >>> classloader specific to what's being loaded and swaps out the
> > Thread's
> > > >>> loader when it calls to outside resources).
> > > >>>
> > > >>> There should be no reason to load modules outside the NAR. Why do
> > you
> > > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser
> > is
> > > just
> > > >>> parsing to json it shouldn’t need much.And we could package them in
> > > the
> > > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > > >>> specialization. It should all be in the nar.
> > > >>>
> > > >>> The Stellar Processor, if you would support the works would
> possibly
> > > need
> > > >>> this.
> > > >>>
> > > >>> 3. Create a MetronZkControllerService to supply our configs to our
> > > >>> processors.
> > > >>> - This is a pretty established NiFi pattern for being able to
> > provide
> > > >>> access to other services needed by a Processor (e.g. databases or
> > > large
> > > >>> configurations files).
> > > >>> - The same controller service can be used by all Processors to
> > manage
> > > >>> configs in a consistent manner.
> > > >>>
> > > >>> I think controller services would make sense where needed, I’m just
> > > not
> > > >>> sure what you imagine them being needed for?
> > > >>>
> > > >>> If the user has NiFi, and a Registry etc, are you saying you
> imagine
> > > them
> > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > storm
> > > >>> processors and Nifi Processors?
> > > >>>
> > > >>> At that point, we can just NAR our controller service and parser
> > > processor
> > > >>>
> > > >>> up as needed, deploy them to NiFi, and let the user provide a
> config
> > > for
> > > >>> where their custom parsers can be provided (i.e. their parser jar).
> > > This
> > > >>> would be 3 nars (processor, controller-service, and
> > > controller-service-api
> > > >>>
> > > >>> in order to bind the other two together).
> > > >>>
> > > >>> Once deployed, our ability to use parsers should fit well into the
> > > >>> standard
> > > >>> NiFi workflow:
> > > >>>
> > > >>> 1. Create a MetronZkControllerService.
> > > >>> 2. Configure the service to point at zookeeper.
> > > >>> 3. Create a MetronParser.
> > > >>> 4. Configure it to use the controller service + parser jar location
> > +
> > > >>> any other needed configs.
> > > >>> 5. Use the outputs as needed downstream (either writing out to
> Kafka
> > > or
> > > >>> feeding into more MetronParsers, etc.)
> > > >>>
> > > >>> Chaining parsers should ideally become a matter of chaining
> > > MetronParsers
> > > >>>
> > > >>> (and making sure the enveloping configs carry through properly).
> For
> > > >>> parser
> > > >>> aggregation, I'd just avoid it entirely until we know it's needed
> in
> > > NiFi.
> > > >>>
> > > >>> Justin
> > >
> > > -------------------
> > > Thank you,
> > >
> > > James Sirota
> > > PMC- Apache Metron
> > > jsirota AT apache DOT org
> > >
> > >
> >
>
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Justin Leet <ju...@gmail.com>.

That's definitely good info, thanks for reaching out to them about it.

In terms of exposing/sharing, I don't think we have to couple them tightly
(in fact, I think we should loosen the coupling as much as possible without
forcing reimplementation of things). I think there's definitely a way to do
that terms of the general purpose processor I proposed (or in terms of
RecordReader or another implementation).

It would definitely be easy enough to configure it to either pull from ZK
or to use a parser config json extract as a parameter (to maintain the same
formatting and make migration easy).  And we can still build specific
NiFi-oriented parsers as needed (that manage things like Schema via the
registry and other Nifi mechanisms).  This keeps parsers entirely decoupled
from a metron installation.

Alternatively, we extract our config handling to a module and scripts we
can package up and easily deploy configs against ZK (or the maybe Nifi's
StateController's or whatever).  We definitely shouldn't need absolutely
everything installed to be able to run just parsers on Nifi.

Having said that, right now the easiest way we have to maintain on the fly
updatable configs (and updatable is important!) is via ZK.  Params in Nifi
aren't quite that flexible, to the best of my knowledge (i.e. you have to
stop, update config and restart). We might be able to exploit the
StateController to manage this for us, but I'm honestly not familiar enough
with it and for deployments split between NiFi and Storm, it means
configuration gets managed in a couple different ways (which may with users
since there is a fairly brightline delineation which makes it easier to
accept).  There some complicated configs like fieldTransforms, which is
part of why I would like things to be configured in the same format (if not
the same mechanism).

Ideally, in my mind, the parsers shared between both NiFi and Storm just
implement the very general MessageParser interface (which is pretty
minimal, a couple setup methods, validation, and the actual parse).  This
is pretty lightweight and the split of metron-parsers into
metron-parsers-common et al. would loosen the coupling between parsers and
the rest of metron into that core needed to support that.

IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
config management) that lets us run our set of parsers, lets users build
new parsers (and don't block specialized NiFi implementations that exploit
NiFi's feature set), and lets us get things configured in a relatively
consistent manner, without losing features, and hopefully requiring a
pretty minimal slice of Metron to be useful.

On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ot...@gmail.com> wrote:

> I think the benefits are clear.  What is unclear is if the goal is to
> expose or share or re-use Metron capabilities ( stellar, parsing ) in nifi
> in a way that is native to nifi ( configured and managed in nifi ), where
> you may not even need metron ( say you just want to parse asa ) or if the
> goal is to have a hybrid approach coupling the processors/readers to the
> metron installation.
>
>
> On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com) wrote:
>
> I'll add onto Mike's discussion with the original set of requirements I
> had
> in mind (and apply feedback on these as necessary!). This is largely
> overlap with what Mike said, but I want to make sure it's clear where my
> proposal was coming from, so we can improve on it as needed. James and
> Mike are also right, I think I skipped over the benefits of NiFi in
> general
> a bit, so thanks for chiming in there.
>
> - Deploy our bundled parsers without needing custom wrapping on all of
> them.
> - Don't prevent ourselves from building custom wrapping as needed.
> - Custom Java parsers with an easy way to hook in, similar to what we
> already do in Storm.
> - One stop (or at least one format) configuration, for the case when we're
> doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> indexing). I don't think it'll always be "start in NiFi, end in Storm",
> especially as we build out Stellar capability, but I also don't want users
> learning a different set of configs and config tools for every platform we
> run on.
> - Ability to build out parsers and other systems fairly easily, e.g.
> Spark.
> - Support our current use cases (in particular parser chaining as a more
> advanced use case).
>
> It really boils down to providing a relatively simple user path to be able
> to migrate to NiFi as needed or desired as simply as possible in a very
> general way, while not preventing parser by parser enhancements.
>
> On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > I think it also provides customers greater control over their
> architecture
> > by giving them the flexibility to choose where/how to host their
> parsers.
> >
> > To Justin's point about the API, my biggest concern about the
> RecordReader
> > approach is that it is not stable. We already have a similar problem in
> > having the TransportClient in ElasticSearch - they are prone to changing
> it
> > in minor versions with the advent of their newer REST API, which is
> > problematic for ensuring a stable installation.
> >
> > From my own perspective, our goal with NiFi, at least in part, should be
> > the ability to deploy our core parsing infrastructure, i.e.
> >
> > - pre-built parsers
> > - custom java parsers
> > - Stellar transforms
> > - custom stellar transforms
> >
> > And have the ability to configure it similarly to how we configure
> parsers
> > within Storm. Consistent with our recent parser chaining and aggregation
> > feature, users should be able to construct and deploy similar constructs
> in
> > NiFi. The core architectural shift would be that parser code should be
> > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > Streaming?, other> and platform architects and devops teams can choose
> how
> > and where to deploy.
> >
> > Best,
> > Mike
> >
> >
> > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org> wrote:
> >
> > > Integration with NiFi would be useful for parsing low-volume
> telemetries
> > > at the edge. This is a much more resource friendly way to do it than
> > > setting up dedicated storm topologies. The integration would be that
> the
> > > NiFi processor parses the data and pushes it straight into the
> enrichment
> > > topic, saving us the resources of having multiple parsers in storm
> > >
> > > Thanks,
> > > James
> > >
> > > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > > Why do we start over. We are going back and forth on implementation,
> > and
> > > I
> > > > don’t think we have the same goals or concerns.
> > > >
> > > > What would be the requirements or goals of metron integration with
> > Nifi?
> > > > How many levels or options for integration do we have?
> > > > What are the approaches to choose from?
> > > > Who are the target users?
> > > >
> > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >
> > > > So how does the MetronRecordReader roll into everything? It seems
> like
> > > it'd
> > > > be more useful on the reader per format approach, but otherwise it
> > > doesn't
> > > > really seem like we gain much, and it requires getting everything
> > linked
> > > up
> > > > properly to be used. Assuming we looked at doing it that way, is the
> > idea
> > > > that we'd setup a ControllerService with the MetronRecordReader and
> a
> > > > MetronRecordWriter and then have the StellarTransformRecord
> processor
> > > > configured with those ControllerServices? How do we manage the
> > > > configurations of the everything that way? How does the
> > ControllerService
> > > > get configured with whatever parser(s) are needed in the flow?
> > Basically,
> > > > what's your vision for how everything would tie together?
> > > >
> > > > I also forgot to mention this in the original writeup, but there's
> > > another
> > > > reason to avoid the RecordReader: It's not considered stable. See
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > > .
> > > > That alone makes me super hesitant to use it, if it can shift out
> from
> > > > under us in even in incremental version.
> > > >
> > > > I'm also unclear on why StellarTransformRecord processor matters for
> > > either
> > > > approach. With the Processor approach you could simply follow it up
> > with
> > > > the Stellar processor, the same way you'd would in the RecordReader
> > > > approach. The Stellar processor should be a parallel improvement,
> not a
> > > > conflicting one.
> > > >
> > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com>
>
> > > wrote:
> > > >
> > > >> A Metron Processor itself isn’t really necessary. A
> > MetronRecordReader
> > > (
> > > >> either the megalithic or a reader per format ) would be a good
> > > approach.
> > > >> Then have StellarTransformRecord processor that can do Stellar on
> > _any_
> > > >> record, regardless of source.
> > > >>
> > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >>
> > > >> Thanks for the comments, Otto, this is definitely great feedback.
> I'd
> > > >> love to respond inline, but the email's already starting to lose
> it's
> > > >> formatting, so I'll go with the classic "wall of text". Let me know
> > if
> > > I
> > > >> didn't address everything.
> > > >>
> > > >> Loading modules (or jars or whatever) outside of our Processor
> gives
> > us
> > > >> the benefit of making it incredibly easy for a users to create
> their
> > > own
> > > >> parsers. I would definitely expect our own bundled parsers to be
> > > included
> > > >> in our base NAR, but loading modules enables users to only have to
> > > learn
> > > >> how Metron wants our stuff lined up and just plug it in. Having
> said
> > > that,
> > > >> I could see having a wrapper for our bundled parsers that makes it
> > > really
> > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> etc.
> > > That
> > > >> would give us the best of both worlds, where it's easy to get setup
> > our
> > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> What
> > > >> doing this gives us is an easy way to support (hopefully) every
> > parser
> > > that
> > > >> gets made, right out of the box, without us needing to build a
> > > specialized
> > > >> version of everything until we decide to and without users having
> to
> > > jump
> > > >> through hoops.
> > > >>
> > > >> None of this prevents anyone from creating specialized parsers (for
> > > perf
> > > >> reasons, or to use the schema registries, or anything else). It's
> > > probably
> > > >> worthwhile to package up some of built-in parsers and customize
> them
> > > to use
> > > >> more specialized feature appropriately as we see things get used in
> > the
> > > >> wild. Like you said, we could likely provide Avro schemas for some
> of
> > > this
> > > >> and give users a more robust experience on what we choose to
> support
> > > and
> > > >> provide guidance for other things. I'm also worried that building
> > > >> specialized schemas becomes problematic for things like parser
> > chaining
> > > >> (where our routers wrap the underlying messages and add on their
> own
> > > info).
> > > >> Going down that road potentially requires anything wrapped to have
> a
> > > >> specialized schema for the wrapped version in addition to a vanilla
> > > version
> > > >> (although please correct me if I'm missing something there, I'll
> > openly
> > > >> admit to some shakiness on how that would be handled).
> > > >>
> > > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> > not
> > > as
> > > >> skilled there. The basis for doing this is directly inspired by the
> > > >> JoltTransformer, which is extremely similar to the proposed setup
> for
> > > our
> > > >> parsers: Simply take a spec (in this case the configs, including
> the
> > > >> fieldTransformations), and delegate a mapping from bytes[] to JSON.
> > The
> > > >> Jolt library even has an Expression Language (check out
> > > >>
> > >
> >
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > > ),
> > > >> so it's not a foreign concept. I believe Simon Ball has already
> done
> > > some
> > > >> experimenting around with getting Stellar running in NiFi, and I'd
> > > love to
> > > >> see Stellar more readily available in NiFi in general.
> > > >>
> > > >> Re: the ControllerService, I see this as a way to maintain Metron's
> > > use of
> > > >> ZK as the source of config truth. Users could definitely be using
> > NiFi
> > > and
> > > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > > >> example). Using the ControllerService gives us a ZK instance as the
> > > single
> > > >> source of truth. That way we aren't forcing users to go to two
> > > different
> > > >> places to manage configs. This also lets us leverage our existing
> > > scripts
> > > >> and our existing infrastructure around configs and their management
> > and
> > > >> validation very easily. It also gives users a way to port from NiFi
> > to
> > > >> Storm or vice-versa without having to migrate configs as well. We
> > could
> > > >> also provide the option to configure the Processor itself with the
> > data
> > > >> (just don't set up a controller service and provide the json or
> > > whatever as
> > > >> one of our properties).
> > > >>
> > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> I think this is a good idea. As I mentioned in the other thread
> I’ve
> > > >>> been doing a lot of work on Nifi recently.
> > > >>> I think the important thing is that what is done should be done
> the
> > > NiFi
> > > >>> way, not bolting the Metron composition
> > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > components
> > > >>> should be single purpose and simple, allowing
> > > >>> exceptional flexibility in composition.
> > > >>>
> > > >>> Comments inline.
> > > >>>
> > > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
>
> > > wrote:
> > > >>>
> > > >>> Hi all,
> > > >>>
> > > >>> There's interest in being able to run Metron parsers in NiFi,
> rather
> > > than
> > > >>>
> > > >>> inside Storm. I dug into this a bit, and have some thoughts on how
> > we
> > > >>> could
> > > >>> go about this. I'd love feedback on this, along with anything we'd
> > > >>> consider must haves as well as future enhancements.
> > > >>>
> > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > metron-storm
> > > >>> and create metron-parsers-nifi. For this code to be reusable
> across
> > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> need
> > > to
> > > >>> decouple our parsers and Storm.
> > > >>>
> > > >>> +1. The “parsing code” should be a library that implements an
> > > interface
> > > >>> ( another library ).
> > > >>>
> > > >>> The Processors and the Storm things can share them.
> > > >>>
> > > >>> - There's also some nice fringe benefits around refactoring our
> code
> > > >>> to be substantially more clear and understandable; something
> > > >>> which came up
> > > >>> while allowing for parser aggregation.
> > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > >>> and is meant
> > > >>> to be used by ControllerServices, which are then used by
> Processors.
> > > >>> There's friction involved there in terms of schemas, but also in
> > > terms of
> > > >>>
> > > >>> access to ZK configs and things like parser chaining. We might
> > > >>> be able to
> > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > >>> without getting
> > > >>> the schema and other benefits.
> > > >>>
> > > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> > json
> > > ).
> > > >>>
> > > >>> All the remaining processors DO have schemas that we know about.
> We
> > > can
> > > >>> just provide the avro schemas the same way we provide the ES
> > schemas.
> > > >>>
> > > >>> The “parsing” should not be conflated with the transform/stellar
> in
> > > >>> NiFi. We should make that separate. Running Stellar over Records
> > > would be
> > > >>> the best thing.
> > > >>>
> > > >>> - This Processor would work similarly to Storm: bytes[] in -> JSON
> > > >>> out.
> > > >>> - There is a Processor
> > > >>> <
> > > >>>
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > >>> >
> > > >>> that
> > > >>> handles loading other JARs that we can model a
> > > >>> MetronParserProcessor off of
> > > >>> that handles classpath/classloader issues (basically just sets up
> a
> > > >>> classloader specific to what's being loaded and swaps out the
> > Thread's
> > > >>> loader when it calls to outside resources).
> > > >>>
> > > >>> There should be no reason to load modules outside the NAR. Why do
> > you
> > > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser
> > is
> > > just
> > > >>> parsing to json it shouldn’t need much.And we could package them
> in
> > > the
> > > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > > >>> specialization. It should all be in the nar.
> > > >>>
> > > >>> The Stellar Processor, if you would support the works would
> possibly
> > > need
> > > >>> this.
> > > >>>
> > > >>> 3. Create a MetronZkControllerService to supply our configs to our
> > > >>> processors.
> > > >>> - This is a pretty established NiFi pattern for being able to
> > provide
> > > >>> access to other services needed by a Processor (e.g. databases or
> > > large
> > > >>> configurations files).
> > > >>> - The same controller service can be used by all Processors to
> > manage
> > > >>> configs in a consistent manner.
> > > >>>
> > > >>> I think controller services would make sense where needed, I’m
> just
> > > not
> > > >>> sure what you imagine them being needed for?
> > > >>>
> > > >>> If the user has NiFi, and a Registry etc, are you saying you
> imagine
> > > them
> > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > storm
> > > >>> processors and Nifi Processors?
> > > >>>
> > > >>> At that point, we can just NAR our controller service and parser
> > > processor
> > > >>>
> > > >>> up as needed, deploy them to NiFi, and let the user provide a
> config
> > > for
> > > >>> where their custom parsers can be provided (i.e. their parser
> jar).
> > > This
> > > >>> would be 3 nars (processor, controller-service, and
> > > controller-service-api
> > > >>>
> > > >>> in order to bind the other two together).
> > > >>>
> > > >>> Once deployed, our ability to use parsers should fit well into the
> > > >>> standard
> > > >>> NiFi workflow:
> > > >>>
> > > >>> 1. Create a MetronZkControllerService.
> > > >>> 2. Configure the service to point at zookeeper.
> > > >>> 3. Create a MetronParser.
> > > >>> 4. Configure it to use the controller service + parser jar
> location
> > +
> > > >>> any other needed configs.
> > > >>> 5. Use the outputs as needed downstream (either writing out to
> Kafka
> > > or
> > > >>> feeding into more MetronParsers, etc.)
> > > >>>
> > > >>> Chaining parsers should ideally become a matter of chaining
> > > MetronParsers
> > > >>>
> > > >>> (and making sure the enveloping configs carry through properly).
> For
> > > >>> parser
> > > >>> aggregation, I'd just avoid it entirely until we know it's needed
> in
> > > NiFi.
> > > >>>
> > > >>> Justin
> > >
> > > -------------------
> > > Thank you,
> > >
> > > James Sirota
> > > PMC- Apache Metron
> > > jsirota AT apache DOT org
> > >
> > >
> >
>
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

I think the benefits are clear.  What is unclear is if the goal is to
expose or share or re-use Metron capabilities ( stellar, parsing ) in nifi
in a way that is native to nifi ( configured and managed in nifi ), where
you may not even need metron ( say you just want to parse asa ) or if the
goal is to have a hybrid approach coupling the processors/readers to the
metron installation.


On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com) wrote:

I'll add onto Mike's discussion with the original set of requirements I had
in mind (and apply feedback on these as necessary!). This is largely
overlap with what Mike said, but I want to make sure it's clear where my
proposal was coming from, so we can improve on it as needed. James and
Mike are also right, I think I skipped over the benefits of NiFi in general
a bit, so thanks for chiming in there.

- Deploy our bundled parsers without needing custom wrapping on all of
them.
- Don't prevent ourselves from building custom wrapping as needed.
- Custom Java parsers with an easy way to hook in, similar to what we
already do in Storm.
- One stop (or at least one format) configuration, for the case when we're
doing some thing in NiFi (parsers) and some elsewhere (enrichment and
indexing). I don't think it'll always be "start in NiFi, end in Storm",
especially as we build out Stellar capability, but I also don't want users
learning a different set of configs and config tools for every platform we
run on.
- Ability to build out parsers and other systems fairly easily, e.g. Spark.
- Support our current use cases (in particular parser chaining as a more
advanced use case).

It really boils down to providing a relatively simple user path to be able
to migrate to NiFi as needed or desired as simply as possible in a very
general way, while not preventing parser by parser enhancements.

On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> I think it also provides customers greater control over their
architecture
> by giving them the flexibility to choose where/how to host their parsers.
>
> To Justin's point about the API, my biggest concern about the
RecordReader
> approach is that it is not stable. We already have a similar problem in
> having the TransportClient in ElasticSearch - they are prone to changing
it
> in minor versions with the advent of their newer REST API, which is
> problematic for ensuring a stable installation.
>
> From my own perspective, our goal with NiFi, at least in part, should be
> the ability to deploy our core parsing infrastructure, i.e.
>
> - pre-built parsers
> - custom java parsers
> - Stellar transforms
> - custom stellar transforms
>
> And have the ability to configure it similarly to how we configure
parsers
> within Storm. Consistent with our recent parser chaining and aggregation
> feature, users should be able to construct and deploy similar constructs
in
> NiFi. The core architectural shift would be that parser code should be
> platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> Streaming?, other> and platform architects and devops teams can choose
how
> and where to deploy.
>
> Best,
> Mike
>
>
> On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org> wrote:
>
> > Integration with NiFi would be useful for parsing low-volume
telemetries
> > at the edge. This is a much more resource friendly way to do it than
> > setting up dedicated storm topologies. The integration would be that
the
> > NiFi processor parses the data and pushes it straight into the
enrichment
> > topic, saving us the resources of having multiple parsers in storm
> >
> > Thanks,
> > James
> >
> > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > Why do we start over. We are going back and forth on implementation,
> and
> > I
> > > don’t think we have the same goals or concerns.
> > >
> > > What would be the requirements or goals of metron integration with
> Nifi?
> > > How many levels or options for integration do we have?
> > > What are the approaches to choose from?
> > > Who are the target users?
> > >
> > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >
> > > So how does the MetronRecordReader roll into everything? It seems
like
> > it'd
> > > be more useful on the reader per format approach, but otherwise it
> > doesn't
> > > really seem like we gain much, and it requires getting everything
> linked
> > up
> > > properly to be used. Assuming we looked at doing it that way, is the
> idea
> > > that we'd setup a ControllerService with the MetronRecordReader and a
> > > MetronRecordWriter and then have the StellarTransformRecord processor
> > > configured with those ControllerServices? How do we manage the
> > > configurations of the everything that way? How does the
> ControllerService
> > > get configured with whatever parser(s) are needed in the flow?
> Basically,
> > > what's your vision for how everything would tie together?
> > >
> > > I also forgot to mention this in the original writeup, but there's
> > another
> > > reason to avoid the RecordReader: It's not considered stable. See
> > >
> >
>
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > .
> > > That alone makes me super hesitant to use it, if it can shift out
from
> > > under us in even in incremental version.
> > >
> > > I'm also unclear on why StellarTransformRecord processor matters for
> > either
> > > approach. With the Processor approach you could simply follow it up
> with
> > > the Stellar processor, the same way you'd would in the RecordReader
> > > approach. The Stellar processor should be a parallel improvement, not
a
> > > conflicting one.
> > >
> > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com>
> > wrote:
> > >
> > >> A Metron Processor itself isn’t really necessary. A
> MetronRecordReader
> > (
> > >> either the megalithic or a reader per format ) would be a good
> > approach.
> > >> Then have StellarTransformRecord processor that can do Stellar on
> _any_
> > >> record, regardless of source.
> > >>
> > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >>
> > >> Thanks for the comments, Otto, this is definitely great feedback.
I'd
> > >> love to respond inline, but the email's already starting to lose
it's
> > >> formatting, so I'll go with the classic "wall of text". Let me know
> if
> > I
> > >> didn't address everything.
> > >>
> > >> Loading modules (or jars or whatever) outside of our Processor gives
> us
> > >> the benefit of making it incredibly easy for a users to create their
> > own
> > >> parsers. I would definitely expect our own bundled parsers to be
> > included
> > >> in our base NAR, but loading modules enables users to only have to
> > learn
> > >> how Metron wants our stuff lined up and just plug it in. Having said
> > that,
> > >> I could see having a wrapper for our bundled parsers that makes it
> > really
> > >> easy to just say you want an MetronAsaParser or MetronBroParser,
etc.
> > That
> > >> would give us the best of both worlds, where it's easy to get setup
> our
> > >> bundled parsers and also trivial to pull in non-bundled parsers.
What
> > >> doing this gives us is an easy way to support (hopefully) every
> parser
> > that
> > >> gets made, right out of the box, without us needing to build a
> > specialized
> > >> version of everything until we decide to and without users having to
> > jump
> > >> through hoops.
> > >>
> > >> None of this prevents anyone from creating specialized parsers (for
> > perf
> > >> reasons, or to use the schema registries, or anything else). It's
> > probably
> > >> worthwhile to package up some of built-in parsers and customize them
> > to use
> > >> more specialized feature appropriately as we see things get used in
> the
> > >> wild. Like you said, we could likely provide Avro schemas for some
of
> > this
> > >> and give users a more robust experience on what we choose to support
> > and
> > >> provide guidance for other things. I'm also worried that building
> > >> specialized schemas becomes problematic for things like parser
> chaining
> > >> (where our routers wrap the underlying messages and add on their own
> > info).
> > >> Going down that road potentially requires anything wrapped to have a
> > >> specialized schema for the wrapped version in addition to a vanilla
> > version
> > >> (although please correct me if I'm missing something there, I'll
> openly
> > >> admit to some shakiness on how that would be handled).
> > >>
> > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> not
> > as
> > >> skilled there. The basis for doing this is directly inspired by the
> > >> JoltTransformer, which is extremely similar to the proposed setup
for
> > our
> > >> parsers: Simply take a spec (in this case the configs, including the
> > >> fieldTransformations), and delegate a mapping from bytes[] to JSON.
> The
> > >> Jolt library even has an Expression Language (check out
> > >>
> >
>
https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > ),
> > >> so it's not a foreign concept. I believe Simon Ball has already done
> > some
> > >> experimenting around with getting Stellar running in NiFi, and I'd
> > love to
> > >> see Stellar more readily available in NiFi in general.
> > >>
> > >> Re: the ControllerService, I see this as a way to maintain Metron's
> > use of
> > >> ZK as the source of config truth. Users could definitely be using
> NiFi
> > and
> > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > >> example). Using the ControllerService gives us a ZK instance as the
> > single
> > >> source of truth. That way we aren't forcing users to go to two
> > different
> > >> places to manage configs. This also lets us leverage our existing
> > scripts
> > >> and our existing infrastructure around configs and their management
> and
> > >> validation very easily. It also gives users a way to port from NiFi
> to
> > >> Storm or vice-versa without having to migrate configs as well. We
> could
> > >> also provide the option to configure the Processor itself with the
> data
> > >> (just don't set up a controller service and provide the json or
> > whatever as
> > >> one of our properties).
> > >>
> > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ottobackwards@gmail.com
> >
> > >> wrote:
> > >>
> > >>> I think this is a good idea. As I mentioned in the other thread
I’ve
> > >>> been doing a lot of work on Nifi recently.
> > >>> I think the important thing is that what is done should be done the
> > NiFi
> > >>> way, not bolting the Metron composition
> > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > components
> > >>> should be single purpose and simple, allowing
> > >>> exceptional flexibility in composition.
> > >>>
> > >>> Comments inline.
> > >>>
> > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> There's interest in being able to run Metron parsers in NiFi,
rather
> > than
> > >>>
> > >>> inside Storm. I dug into this a bit, and have some thoughts on how
> we
> > >>> could
> > >>> go about this. I'd love feedback on this, along with anything we'd
> > >>> consider must haves as well as future enhancements.
> > >>>
> > >>> 1. Separate metron-parsers into metron-parsers-common and
> metron-storm
> > >>> and create metron-parsers-nifi. For this code to be reusable across
> > >>> platforms (NiFi, Storm, and anything else in the future), we'll
need
> > to
> > >>> decouple our parsers and Storm.
> > >>>
> > >>> +1. The “parsing code” should be a library that implements an
> > interface
> > >>> ( another library ).
> > >>>
> > >>> The Processors and the Storm things can share them.
> > >>>
> > >>> - There's also some nice fringe benefits around refactoring our
code
> > >>> to be substantially more clear and understandable; something
> > >>> which came up
> > >>> while allowing for parser aggregation.
> > >>> 2. Create a MetronProcessor that can run our parsers.
> > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > >>> and is meant
> > >>> to be used by ControllerServices, which are then used by
Processors.
> > >>> There's friction involved there in terms of schemas, but also in
> > terms of
> > >>>
> > >>> access to ZK configs and things like parser chaining. We might
> > >>> be able to
> > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > >>> without getting
> > >>> the schema and other benefits.
> > >>>
> > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> json
> > ).
> > >>>
> > >>> All the remaining processors DO have schemas that we know about. We
> > can
> > >>> just provide the avro schemas the same way we provide the ES
> schemas.
> > >>>
> > >>> The “parsing” should not be conflated with the transform/stellar in
> > >>> NiFi. We should make that separate. Running Stellar over Records
> > would be
> > >>> the best thing.
> > >>>
> > >>> - This Processor would work similarly to Storm: bytes[] in -> JSON
> > >>> out.
> > >>> - There is a Processor
> > >>> <
> > >>>
> >
>
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > >>> >
> > >>> that
> > >>> handles loading other JARs that we can model a
> > >>> MetronParserProcessor off of
> > >>> that handles classpath/classloader issues (basically just sets up a
> > >>> classloader specific to what's being loaded and swaps out the
> Thread's
> > >>> loader when it calls to outside resources).
> > >>>
> > >>> There should be no reason to load modules outside the NAR. Why do
> you
> > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser
> is
> > just
> > >>> parsing to json it shouldn’t need much.And we could package them in
> > the
> > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > >>> specialization. It should all be in the nar.
> > >>>
> > >>> The Stellar Processor, if you would support the works would
possibly
> > need
> > >>> this.
> > >>>
> > >>> 3. Create a MetronZkControllerService to supply our configs to our
> > >>> processors.
> > >>> - This is a pretty established NiFi pattern for being able to
> provide
> > >>> access to other services needed by a Processor (e.g. databases or
> > large
> > >>> configurations files).
> > >>> - The same controller service can be used by all Processors to
> manage
> > >>> configs in a consistent manner.
> > >>>
> > >>> I think controller services would make sense where needed, I’m just
> > not
> > >>> sure what you imagine them being needed for?
> > >>>
> > >>> If the user has NiFi, and a Registry etc, are you saying you
imagine
> > them
> > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> storm
> > >>> processors and Nifi Processors?
> > >>>
> > >>> At that point, we can just NAR our controller service and parser
> > processor
> > >>>
> > >>> up as needed, deploy them to NiFi, and let the user provide a
config
> > for
> > >>> where their custom parsers can be provided (i.e. their parser jar).
> > This
> > >>> would be 3 nars (processor, controller-service, and
> > controller-service-api
> > >>>
> > >>> in order to bind the other two together).
> > >>>
> > >>> Once deployed, our ability to use parsers should fit well into the
> > >>> standard
> > >>> NiFi workflow:
> > >>>
> > >>> 1. Create a MetronZkControllerService.
> > >>> 2. Configure the service to point at zookeeper.
> > >>> 3. Create a MetronParser.
> > >>> 4. Configure it to use the controller service + parser jar location
> +
> > >>> any other needed configs.
> > >>> 5. Use the outputs as needed downstream (either writing out to
Kafka
> > or
> > >>> feeding into more MetronParsers, etc.)
> > >>>
> > >>> Chaining parsers should ideally become a matter of chaining
> > MetronParsers
> > >>>
> > >>> (and making sure the enveloping configs carry through properly).
For
> > >>> parser
> > >>> aggregation, I'd just avoid it entirely until we know it's needed
in
> > NiFi.
> > >>>
> > >>> Justin
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PMC- Apache Metron
> > jsirota AT apache DOT org
> >
> >
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

I reached out to the nifi list about the Record api ‘stability'

On August 9, 2018 at 09:54:22, Bryan Bende (bbende@gmail.com) wrote:

I don't think there are any stability issues with the record API, it
is definitely recommended to use the record approach where it makes
sense.

That comment was probably put there on the first release and never
removed, and now it has been 4-5 releases later.

As a general comment to APIs, the record stuff is part of a controller
service API, and not part of the framework API, so I do think there is
more freedom to change the API on minor releases if needed, however I
don't see any major changes to the record stuff happening.

On Thu, Aug 9, 2018 at 5:58 AM, Mike Thomsen <mi...@gmail.com>
wrote:
> I think that comment is no longer valid. Heck PutHBaseRecord started as
> part of a project at my company in early 2017 and we found it perfectly
> stable back then.
> On Wed, Aug 8, 2018 at 11:46 PM Otto Fowler <ot...@gmail.com>
wrote:
>
>> I’m seeing
>>
>>
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34

>> being quoted as a reason to NOT build Record based processors but
instead
>> stick with the original Processor api.
>>
>> Yet, on list and on hipchat and in pr’s I’ve seen the Record approach
being
>> promoted heavily.
>>
>> Is this comment still correct? Is the API not considered stable?
>> Would the NiFi project recommend building externally hosted NiFi
components
>> using the Record API?
>>
>> ottO
>>



On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com) wrote:

I'll add onto Mike's discussion with the original set of requirements I had
in mind (and apply feedback on these as necessary!). This is largely
overlap with what Mike said, but I want to make sure it's clear where my
proposal was coming from, so we can improve on it as needed. James and
Mike are also right, I think I skipped over the benefits of NiFi in general
a bit, so thanks for chiming in there.

- Deploy our bundled parsers without needing custom wrapping on all of
them.
- Don't prevent ourselves from building custom wrapping as needed.
- Custom Java parsers with an easy way to hook in, similar to what we
already do in Storm.
- One stop (or at least one format) configuration, for the case when we're
doing some thing in NiFi (parsers) and some elsewhere (enrichment and
indexing). I don't think it'll always be "start in NiFi, end in Storm",
especially as we build out Stellar capability, but I also don't want users
learning a different set of configs and config tools for every platform we
run on.
- Ability to build out parsers and other systems fairly easily, e.g. Spark.
- Support our current use cases (in particular parser chaining as a more
advanced use case).

It really boils down to providing a relatively simple user path to be able
to migrate to NiFi as needed or desired as simply as possible in a very
general way, while not preventing parser by parser enhancements.

On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> I think it also provides customers greater control over their
architecture
> by giving them the flexibility to choose where/how to host their parsers.
>
> To Justin's point about the API, my biggest concern about the
RecordReader
> approach is that it is not stable. We already have a similar problem in
> having the TransportClient in ElasticSearch - they are prone to changing
it
> in minor versions with the advent of their newer REST API, which is
> problematic for ensuring a stable installation.
>
> From my own perspective, our goal with NiFi, at least in part, should be
> the ability to deploy our core parsing infrastructure, i.e.
>
> - pre-built parsers
> - custom java parsers
> - Stellar transforms
> - custom stellar transforms
>
> And have the ability to configure it similarly to how we configure
parsers
> within Storm. Consistent with our recent parser chaining and aggregation
> feature, users should be able to construct and deploy similar constructs
in
> NiFi. The core architectural shift would be that parser code should be
> platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> Streaming?, other> and platform architects and devops teams can choose
how
> and where to deploy.
>
> Best,
> Mike
>
>
> On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org> wrote:
>
> > Integration with NiFi would be useful for parsing low-volume
telemetries
> > at the edge. This is a much more resource friendly way to do it than
> > setting up dedicated storm topologies. The integration would be that
the
> > NiFi processor parses the data and pushes it straight into the
enrichment
> > topic, saving us the resources of having multiple parsers in storm
> >
> > Thanks,
> > James
> >
> > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > Why do we start over. We are going back and forth on implementation,
> and
> > I
> > > don’t think we have the same goals or concerns.
> > >
> > > What would be the requirements or goals of metron integration with
> Nifi?
> > > How many levels or options for integration do we have?
> > > What are the approaches to choose from?
> > > Who are the target users?
> > >
> > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >
> > > So how does the MetronRecordReader roll into everything? It seems
like
> > it'd
> > > be more useful on the reader per format approach, but otherwise it
> > doesn't
> > > really seem like we gain much, and it requires getting everything
> linked
> > up
> > > properly to be used. Assuming we looked at doing it that way, is the
> idea
> > > that we'd setup a ControllerService with the MetronRecordReader and a
> > > MetronRecordWriter and then have the StellarTransformRecord processor
> > > configured with those ControllerServices? How do we manage the
> > > configurations of the everything that way? How does the
> ControllerService
> > > get configured with whatever parser(s) are needed in the flow?
> Basically,
> > > what's your vision for how everything would tie together?
> > >
> > > I also forgot to mention this in the original writeup, but there's
> > another
> > > reason to avoid the RecordReader: It's not considered stable. See
> > >
> >
>
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > .
> > > That alone makes me super hesitant to use it, if it can shift out
from
> > > under us in even in incremental version.
> > >
> > > I'm also unclear on why StellarTransformRecord processor matters for
> > either
> > > approach. With the Processor approach you could simply follow it up
> with
> > > the Stellar processor, the same way you'd would in the RecordReader
> > > approach. The Stellar processor should be a parallel improvement, not
a
> > > conflicting one.
> > >
> > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com>
> > wrote:
> > >
> > >> A Metron Processor itself isn’t really necessary. A
> MetronRecordReader
> > (
> > >> either the megalithic or a reader per format ) would be a good
> > approach.
> > >> Then have StellarTransformRecord processor that can do Stellar on
> _any_
> > >> record, regardless of source.
> > >>
> > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >>
> > >> Thanks for the comments, Otto, this is definitely great feedback.
I'd
> > >> love to respond inline, but the email's already starting to lose
it's
> > >> formatting, so I'll go with the classic "wall of text". Let me know
> if
> > I
> > >> didn't address everything.
> > >>
> > >> Loading modules (or jars or whatever) outside of our Processor gives
> us
> > >> the benefit of making it incredibly easy for a users to create their
> > own
> > >> parsers. I would definitely expect our own bundled parsers to be
> > included
> > >> in our base NAR, but loading modules enables users to only have to
> > learn
> > >> how Metron wants our stuff lined up and just plug it in. Having said
> > that,
> > >> I could see having a wrapper for our bundled parsers that makes it
> > really
> > >> easy to just say you want an MetronAsaParser or MetronBroParser,
etc.
> > That
> > >> would give us the best of both worlds, where it's easy to get setup
> our
> > >> bundled parsers and also trivial to pull in non-bundled parsers.
What
> > >> doing this gives us is an easy way to support (hopefully) every
> parser
> > that
> > >> gets made, right out of the box, without us needing to build a
> > specialized
> > >> version of everything until we decide to and without users having to
> > jump
> > >> through hoops.
> > >>
> > >> None of this prevents anyone from creating specialized parsers (for
> > perf
> > >> reasons, or to use the schema registries, or anything else). It's
> > probably
> > >> worthwhile to package up some of built-in parsers and customize them
> > to use
> > >> more specialized feature appropriately as we see things get used in
> the
> > >> wild. Like you said, we could likely provide Avro schemas for some
of
> > this
> > >> and give users a more robust experience on what we choose to support
> > and
> > >> provide guidance for other things. I'm also worried that building
> > >> specialized schemas becomes problematic for things like parser
> chaining
> > >> (where our routers wrap the underlying messages and add on their own
> > info).
> > >> Going down that road potentially requires anything wrapped to have a
> > >> specialized schema for the wrapped version in addition to a vanilla
> > version
> > >> (although please correct me if I'm missing something there, I'll
> openly
> > >> admit to some shakiness on how that would be handled).
> > >>
> > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> not
> > as
> > >> skilled there. The basis for doing this is directly inspired by the
> > >> JoltTransformer, which is extremely similar to the proposed setup
for
> > our
> > >> parsers: Simply take a spec (in this case the configs, including the
> > >> fieldTransformations), and delegate a mapping from bytes[] to JSON.
> The
> > >> Jolt library even has an Expression Language (check out
> > >>
> >
>
https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > ),
> > >> so it's not a foreign concept. I believe Simon Ball has already done
> > some
> > >> experimenting around with getting Stellar running in NiFi, and I'd
> > love to
> > >> see Stellar more readily available in NiFi in general.
> > >>
> > >> Re: the ControllerService, I see this as a way to maintain Metron's
> > use of
> > >> ZK as the source of config truth. Users could definitely be using
> NiFi
> > and
> > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > >> example). Using the ControllerService gives us a ZK instance as the
> > single
> > >> source of truth. That way we aren't forcing users to go to two
> > different
> > >> places to manage configs. This also lets us leverage our existing
> > scripts
> > >> and our existing infrastructure around configs and their management
> and
> > >> validation very easily. It also gives users a way to port from NiFi
> to
> > >> Storm or vice-versa without having to migrate configs as well. We
> could
> > >> also provide the option to configure the Processor itself with the
> data
> > >> (just don't set up a controller service and provide the json or
> > whatever as
> > >> one of our properties).
> > >>
> > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ottobackwards@gmail.com
> >
> > >> wrote:
> > >>
> > >>> I think this is a good idea. As I mentioned in the other thread
I’ve
> > >>> been doing a lot of work on Nifi recently.
> > >>> I think the important thing is that what is done should be done the
> > NiFi
> > >>> way, not bolting the Metron composition
> > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > components
> > >>> should be single purpose and simple, allowing
> > >>> exceptional flexibility in composition.
> > >>>
> > >>> Comments inline.
> > >>>
> > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> There's interest in being able to run Metron parsers in NiFi,
rather
> > than
> > >>>
> > >>> inside Storm. I dug into this a bit, and have some thoughts on how
> we
> > >>> could
> > >>> go about this. I'd love feedback on this, along with anything we'd
> > >>> consider must haves as well as future enhancements.
> > >>>
> > >>> 1. Separate metron-parsers into metron-parsers-common and
> metron-storm
> > >>> and create metron-parsers-nifi. For this code to be reusable across
> > >>> platforms (NiFi, Storm, and anything else in the future), we'll
need
> > to
> > >>> decouple our parsers and Storm.
> > >>>
> > >>> +1. The “parsing code” should be a library that implements an
> > interface
> > >>> ( another library ).
> > >>>
> > >>> The Processors and the Storm things can share them.
> > >>>
> > >>> - There's also some nice fringe benefits around refactoring our
code
> > >>> to be substantially more clear and understandable; something
> > >>> which came up
> > >>> while allowing for parser aggregation.
> > >>> 2. Create a MetronProcessor that can run our parsers.
> > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > >>> and is meant
> > >>> to be used by ControllerServices, which are then used by
Processors.
> > >>> There's friction involved there in terms of schemas, but also in
> > terms of
> > >>>
> > >>> access to ZK configs and things like parser chaining. We might
> > >>> be able to
> > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > >>> without getting
> > >>> the schema and other benefits.
> > >>>
> > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> json
> > ).
> > >>>
> > >>> All the remaining processors DO have schemas that we know about. We
> > can
> > >>> just provide the avro schemas the same way we provide the ES
> schemas.
> > >>>
> > >>> The “parsing” should not be conflated with the transform/stellar in
> > >>> NiFi. We should make that separate. Running Stellar over Records
> > would be
> > >>> the best thing.
> > >>>
> > >>> - This Processor would work similarly to Storm: bytes[] in -> JSON
> > >>> out.
> > >>> - There is a Processor
> > >>> <
> > >>>
> >
>
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > >>> >
> > >>> that
> > >>> handles loading other JARs that we can model a
> > >>> MetronParserProcessor off of
> > >>> that handles classpath/classloader issues (basically just sets up a
> > >>> classloader specific to what's being loaded and swaps out the
> Thread's
> > >>> loader when it calls to outside resources).
> > >>>
> > >>> There should be no reason to load modules outside the NAR. Why do
> you
> > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser
> is
> > just
> > >>> parsing to json it shouldn’t need much.And we could package them in
> > the
> > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > >>> specialization. It should all be in the nar.
> > >>>
> > >>> The Stellar Processor, if you would support the works would
possibly
> > need
> > >>> this.
> > >>>
> > >>> 3. Create a MetronZkControllerService to supply our configs to our
> > >>> processors.
> > >>> - This is a pretty established NiFi pattern for being able to
> provide
> > >>> access to other services needed by a Processor (e.g. databases or
> > large
> > >>> configurations files).
> > >>> - The same controller service can be used by all Processors to
> manage
> > >>> configs in a consistent manner.
> > >>>
> > >>> I think controller services would make sense where needed, I’m just
> > not
> > >>> sure what you imagine them being needed for?
> > >>>
> > >>> If the user has NiFi, and a Registry etc, are you saying you
imagine
> > them
> > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> storm
> > >>> processors and Nifi Processors?
> > >>>
> > >>> At that point, we can just NAR our controller service and parser
> > processor
> > >>>
> > >>> up as needed, deploy them to NiFi, and let the user provide a
config
> > for
> > >>> where their custom parsers can be provided (i.e. their parser jar).
> > This
> > >>> would be 3 nars (processor, controller-service, and
> > controller-service-api
> > >>>
> > >>> in order to bind the other two together).
> > >>>
> > >>> Once deployed, our ability to use parsers should fit well into the
> > >>> standard
> > >>> NiFi workflow:
> > >>>
> > >>> 1. Create a MetronZkControllerService.
> > >>> 2. Configure the service to point at zookeeper.
> > >>> 3. Create a MetronParser.
> > >>> 4. Configure it to use the controller service + parser jar location
> +
> > >>> any other needed configs.
> > >>> 5. Use the outputs as needed downstream (either writing out to
Kafka
> > or
> > >>> feeding into more MetronParsers, etc.)
> > >>>
> > >>> Chaining parsers should ideally become a matter of chaining
> > MetronParsers
> > >>>
> > >>> (and making sure the enveloping configs carry through properly).
For
> > >>> parser
> > >>> aggregation, I'd just avoid it entirely until we know it's needed
in
> > NiFi.
> > >>>
> > >>> Justin
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PMC- Apache Metron
> > jsirota AT apache DOT org
> >
> >
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Justin Leet <ju...@gmail.com>.

I'll add onto Mike's discussion with the original set of requirements I had
in mind (and apply feedback on these as necessary!). This is largely
overlap with what Mike said, but I want to make sure it's clear where my
proposal was coming from, so we can improve on it as needed.  James and
Mike are also right, I think I skipped over the benefits of NiFi in general
a bit, so thanks for chiming in there.

- Deploy our bundled parsers without needing custom wrapping on all of them.
- Don't prevent ourselves from building custom wrapping as needed.
- Custom Java parsers with an easy way to hook in, similar to what we
already do in Storm.
- One stop (or at least one format) configuration, for the case when we're
doing some thing in NiFi (parsers) and some elsewhere (enrichment and
indexing). I don't think it'll always be "start in NiFi, end in Storm",
especially as we build out Stellar capability, but I also don't want users
learning a different set of configs and config tools for every platform we
run on.
- Ability to build out parsers and other systems fairly easily, e.g. Spark.
- Support our current use cases (in particular parser chaining as a more
advanced use case).

It really boils down to providing a relatively simple user path to be able
to migrate to NiFi as needed or desired as simply as possible in a very
general way, while not preventing parser by parser enhancements.

On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> I think it also provides customers greater control over their architecture
> by giving them the flexibility to choose where/how to host their parsers.
>
> To Justin's point about the API, my biggest concern about the RecordReader
> approach is that it is not stable. We already have a similar problem in
> having the TransportClient in ElasticSearch - they are prone to changing it
> in minor versions with the advent of their newer REST API, which is
> problematic for ensuring a stable installation.
>
> From my own perspective, our goal with NiFi, at least in part, should be
> the ability to deploy our core parsing infrastructure, i.e.
>
>    - pre-built parsers
>    - custom java parsers
>    - Stellar transforms
>    - custom stellar transforms
>
> And have the ability to configure it similarly to how we configure parsers
> within Storm. Consistent with our recent parser chaining and aggregation
> feature, users should be able to construct and deploy similar constructs in
> NiFi. The core architectural shift would be that parser code should be
> platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> Streaming?, other> and platform architects and devops teams can choose how
> and where to deploy.
>
> Best,
> Mike
>
>
> On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org> wrote:
>
> > Integration with NiFi would be useful for parsing low-volume telemetries
> > at the edge.  This is a much more resource friendly way to do it than
> > setting up dedicated storm topologies.  The integration would be that the
> > NiFi processor parses the data and pushes it straight into the enrichment
> > topic, saving us the resources of having multiple parsers in storm
> >
> > Thanks,
> > James
> >
> > 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > > Why do we start over. We are going back and forth on implementation,
> and
> > I
> > > don’t think we have the same goals or concerns.
> > >
> > > What would be the requirements or goals of metron integration with
> Nifi?
> > > How many levels or options for integration do we have?
> > > What are the approaches to choose from?
> > > Who are the target users?
> > >
> > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >
> > > So how does the MetronRecordReader roll into everything? It seems like
> > it'd
> > > be more useful on the reader per format approach, but otherwise it
> > doesn't
> > > really seem like we gain much, and it requires getting everything
> linked
> > up
> > > properly to be used. Assuming we looked at doing it that way, is the
> idea
> > > that we'd setup a ControllerService with the MetronRecordReader and a
> > > MetronRecordWriter and then have the StellarTransformRecord processor
> > > configured with those ControllerServices? How do we manage the
> > > configurations of the everything that way? How does the
> ControllerService
> > > get configured with whatever parser(s) are needed in the flow?
> Basically,
> > > what's your vision for how everything would tie together?
> > >
> > > I also forgot to mention this in the original writeup, but there's
> > another
> > > reason to avoid the RecordReader: It's not considered stable. See
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > .
> > > That alone makes me super hesitant to use it, if it can shift out from
> > > under us in even in incremental version.
> > >
> > > I'm also unclear on why StellarTransformRecord processor matters for
> > either
> > > approach. With the Processor approach you could simply follow it up
> with
> > > the Stellar processor, the same way you'd would in the RecordReader
> > > approach. The Stellar processor should be a parallel improvement, not a
> > > conflicting one.
> > >
> > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com>
> > wrote:
> > >
> > >>  A Metron Processor itself isn’t really necessary. A
> MetronRecordReader
> > (
> > >>  either the megalithic or a reader per format ) would be a good
> > approach.
> > >>  Then have StellarTransformRecord processor that can do Stellar on
> _any_
> > >>  record, regardless of source.
> > >>
> > >>  On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >>
> > >>  Thanks for the comments, Otto, this is definitely great feedback. I'd
> > >>  love to respond inline, but the email's already starting to lose it's
> > >>  formatting, so I'll go with the classic "wall of text". Let me know
> if
> > I
> > >>  didn't address everything.
> > >>
> > >>  Loading modules (or jars or whatever) outside of our Processor gives
> us
> > >>  the benefit of making it incredibly easy for a users to create their
> > own
> > >>  parsers. I would definitely expect our own bundled parsers to be
> > included
> > >>  in our base NAR, but loading modules enables users to only have to
> > learn
> > >>  how Metron wants our stuff lined up and just plug it in. Having said
> > that,
> > >>  I could see having a wrapper for our bundled parsers that makes it
> > really
> > >>  easy to just say you want an MetronAsaParser or MetronBroParser, etc.
> > That
> > >>  would give us the best of both worlds, where it's easy to get setup
> our
> > >>  bundled parsers and also trivial to pull in non-bundled parsers. What
> > >>  doing this gives us is an easy way to support (hopefully) every
> parser
> > that
> > >>  gets made, right out of the box, without us needing to build a
> > specialized
> > >>  version of everything until we decide to and without users having to
> > jump
> > >>  through hoops.
> > >>
> > >>  None of this prevents anyone from creating specialized parsers (for
> > perf
> > >>  reasons, or to use the schema registries, or anything else). It's
> > probably
> > >>  worthwhile to package up some of built-in parsers and customize them
> > to use
> > >>  more specialized feature appropriately as we see things get used in
> the
> > >>  wild. Like you said, we could likely provide Avro schemas for some of
> > this
> > >>  and give users a more robust experience on what we choose to support
> > and
> > >>  provide guidance for other things. I'm also worried that building
> > >>  specialized schemas becomes problematic for things like parser
> chaining
> > >>  (where our routers wrap the underlying messages and add on their own
> > info).
> > >>  Going down that road potentially requires anything wrapped to have a
> > >>  specialized schema for the wrapped version in addition to a vanilla
> > version
> > >>  (although please correct me if I'm missing something there, I'll
> openly
> > >>  admit to some shakiness on how that would be handled).
> > >>
> > >>  I also disagree that this is un-Nifi-like, although I'm admittedly
> not
> > as
> > >>  skilled there. The basis for doing this is directly inspired by the
> > >>  JoltTransformer, which is extremely similar to the proposed setup for
> > our
> > >>  parsers: Simply take a spec (in this case the configs, including the
> > >>  fieldTransformations), and delegate a mapping from bytes[] to JSON.
> The
> > >>  Jolt library even has an Expression Language (check out
> > >>
> >
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > ),
> > >>  so it's not a foreign concept. I believe Simon Ball has already done
> > some
> > >>  experimenting around with getting Stellar running in NiFi, and I'd
> > love to
> > >>  see Stellar more readily available in NiFi in general.
> > >>
> > >>  Re: the ControllerService, I see this as a way to maintain Metron's
> > use of
> > >>  ZK as the source of config truth. Users could definitely be using
> NiFi
> > and
> > >>  Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > >>  example). Using the ControllerService gives us a ZK instance as the
> > single
> > >>  source of truth. That way we aren't forcing users to go to two
> > different
> > >>  places to manage configs. This also lets us leverage our existing
> > scripts
> > >>  and our existing infrastructure around configs and their management
> and
> > >>  validation very easily. It also gives users a way to port from NiFi
> to
> > >>  Storm or vice-versa without having to migrate configs as well. We
> could
> > >>  also provide the option to configure the Processor itself with the
> data
> > >>  (just don't set up a controller service and provide the json or
> > whatever as
> > >>  one of our properties).
> > >>
> > >>  On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ottobackwards@gmail.com
> >
> > >>  wrote:
> > >>
> > >>>  I think this is a good idea. As I mentioned in the other thread I’ve
> > >>>  been doing a lot of work on Nifi recently.
> > >>>  I think the important thing is that what is done should be done the
> > NiFi
> > >>>  way, not bolting the Metron composition
> > >>>  onto Nifi. Think of it like the Tao of Unix, the parsers and
> > components
> > >>>  should be single purpose and simple, allowing
> > >>>  exceptional flexibility in composition.
> > >>>
> > >>>  Comments inline.
> > >>>
> > >>>  On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
> > wrote:
> > >>>
> > >>>  Hi all,
> > >>>
> > >>>  There's interest in being able to run Metron parsers in NiFi, rather
> > than
> > >>>
> > >>>  inside Storm. I dug into this a bit, and have some thoughts on how
> we
> > >>>  could
> > >>>  go about this. I'd love feedback on this, along with anything we'd
> > >>>  consider must haves as well as future enhancements.
> > >>>
> > >>>  1. Separate metron-parsers into metron-parsers-common and
> metron-storm
> > >>>  and create metron-parsers-nifi. For this code to be reusable across
> > >>>  platforms (NiFi, Storm, and anything else in the future), we'll need
> > to
> > >>>  decouple our parsers and Storm.
> > >>>
> > >>>  +1. The “parsing code” should be a library that implements an
> > interface
> > >>>  ( another library ).
> > >>>
> > >>>  The Processors and the Storm things can share them.
> > >>>
> > >>>  - There's also some nice fringe benefits around refactoring our code
> > >>>  to be substantially more clear and understandable; something
> > >>>  which came up
> > >>>  while allowing for parser aggregation.
> > >>>  2. Create a MetronProcessor that can run our parsers.
> > >>>  - I took a look at how RecordReader could be leveraged (e.g.
> > >>>  CSVRecordReader), but this is pretty tightly tied into schemas
> > >>>  and is meant
> > >>>  to be used by ControllerServices, which are then used by Processors.
> > >>>  There's friction involved there in terms of schemas, but also in
> > terms of
> > >>>
> > >>>  access to ZK configs and things like parser chaining. We might
> > >>>  be able to
> > >>>  leverage it, but it seems like it'd be fairly shoehorned in
> > >>>  without getting
> > >>>  the schema and other benefits.
> > >>>
> > >>>  We won’t have to provide our ‘no schema processors’ ( grok, csv,
> json
> > ).
> > >>>
> > >>>  All the remaining processors DO have schemas that we know about. We
> > can
> > >>>  just provide the avro schemas the same way we provide the ES
> schemas.
> > >>>
> > >>>  The “parsing” should not be conflated with the transform/stellar in
> > >>>  NiFi. We should make that separate. Running Stellar over Records
> > would be
> > >>>  the best thing.
> > >>>
> > >>>  - This Processor would work similarly to Storm: bytes[] in -> JSON
> > >>>  out.
> > >>>  - There is a Processor
> > >>>  <
> > >>>
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > >>>  >
> > >>>  that
> > >>>  handles loading other JARs that we can model a
> > >>>  MetronParserProcessor off of
> > >>>  that handles classpath/classloader issues (basically just sets up a
> > >>>  classloader specific to what's being loaded and swaps out the
> Thread's
> > >>>  loader when it calls to outside resources).
> > >>>
> > >>>  There should be no reason to load modules outside the NAR. Why do
> you
> > >>>  expect to? If each Metron Processor equiv of a Metron Storm Parser
> is
> > just
> > >>>  parsing to json it shouldn’t need much.And we could package them in
> > the
> > >>>  NAR. I would suggest we have a Processor per Parser to allow for
> > >>>  specialization. It should all be in the nar.
> > >>>
> > >>>  The Stellar Processor, if you would support the works would possibly
> > need
> > >>>  this.
> > >>>
> > >>>  3. Create a MetronZkControllerService to supply our configs to our
> > >>>  processors.
> > >>>  - This is a pretty established NiFi pattern for being able to
> provide
> > >>>  access to other services needed by a Processor (e.g. databases or
> > large
> > >>>  configurations files).
> > >>>  - The same controller service can be used by all Processors to
> manage
> > >>>  configs in a consistent manner.
> > >>>
> > >>>  I think controller services would make sense where needed, I’m just
> > not
> > >>>  sure what you imagine them being needed for?
> > >>>
> > >>>  If the user has NiFi, and a Registry etc, are you saying you imagine
> > them
> > >>>  using Metron + ZK to manage configurations? Or to be using BOTH
> storm
> > >>>  processors and Nifi Processors?
> > >>>
> > >>>  At that point, we can just NAR our controller service and parser
> > processor
> > >>>
> > >>>  up as needed, deploy them to NiFi, and let the user provide a config
> > for
> > >>>  where their custom parsers can be provided (i.e. their parser jar).
> > This
> > >>>  would be 3 nars (processor, controller-service, and
> > controller-service-api
> > >>>
> > >>>  in order to bind the other two together).
> > >>>
> > >>>  Once deployed, our ability to use parsers should fit well into the
> > >>>  standard
> > >>>  NiFi workflow:
> > >>>
> > >>>  1. Create a MetronZkControllerService.
> > >>>  2. Configure the service to point at zookeeper.
> > >>>  3. Create a MetronParser.
> > >>>  4. Configure it to use the controller service + parser jar location
> +
> > >>>  any other needed configs.
> > >>>  5. Use the outputs as needed downstream (either writing out to Kafka
> > or
> > >>>  feeding into more MetronParsers, etc.)
> > >>>
> > >>>  Chaining parsers should ideally become a matter of chaining
> > MetronParsers
> > >>>
> > >>>  (and making sure the enveloping configs carry through properly). For
> > >>>  parser
> > >>>  aggregation, I'd just avoid it entirely until we know it's needed in
> > NiFi.
> > >>>
> > >>>  Justin
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PMC- Apache Metron
> > jsirota AT apache DOT org
> >
> >
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Michael Miklavcic <mi...@gmail.com>.

I think it also provides customers greater control over their architecture
by giving them the flexibility to choose where/how to host their parsers.

To Justin's point about the API, my biggest concern about the RecordReader
approach is that it is not stable. We already have a similar problem in
having the TransportClient in ElasticSearch - they are prone to changing it
in minor versions with the advent of their newer REST API, which is
problematic for ensuring a stable installation.

From my own perspective, our goal with NiFi, at least in part, should be
the ability to deploy our core parsing infrastructure, i.e.

   - pre-built parsers
   - custom java parsers
   - Stellar transforms
   - custom stellar transforms

And have the ability to configure it similarly to how we configure parsers
within Storm. Consistent with our recent parser chaining and aggregation
feature, users should be able to construct and deploy similar constructs in
NiFi. The core architectural shift would be that parser code should be
platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
Streaming?, other> and platform architects and devops teams can choose how
and where to deploy.

Best,
Mike


On Wed, Aug 8, 2018 at 9:57 AM James Sirota <js...@apache.org> wrote:

> Integration with NiFi would be useful for parsing low-volume telemetries
> at the edge.  This is a much more resource friendly way to do it than
> setting up dedicated storm topologies.  The integration would be that the
> NiFi processor parses the data and pushes it straight into the enrichment
> topic, saving us the resources of having multiple parsers in storm
>
> Thanks,
> James
>
> 07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> > Why do we start over. We are going back and forth on implementation, and
> I
> > don’t think we have the same goals or concerns.
> >
> > What would be the requirements or goals of metron integration with Nifi?
> > How many levels or options for integration do we have?
> > What are the approaches to choose from?
> > Who are the target users?
> >
> > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> wrote:
> >
> > So how does the MetronRecordReader roll into everything? It seems like
> it'd
> > be more useful on the reader per format approach, but otherwise it
> doesn't
> > really seem like we gain much, and it requires getting everything linked
> up
> > properly to be used. Assuming we looked at doing it that way, is the idea
> > that we'd setup a ControllerService with the MetronRecordReader and a
> > MetronRecordWriter and then have the StellarTransformRecord processor
> > configured with those ControllerServices? How do we manage the
> > configurations of the everything that way? How does the ControllerService
> > get configured with whatever parser(s) are needed in the flow? Basically,
> > what's your vision for how everything would tie together?
> >
> > I also forgot to mention this in the original writeup, but there's
> another
> > reason to avoid the RecordReader: It's not considered stable. See
> >
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> .
> > That alone makes me super hesitant to use it, if it can shift out from
> > under us in even in incremental version.
> >
> > I'm also unclear on why StellarTransformRecord processor matters for
> either
> > approach. With the Processor approach you could simply follow it up with
> > the Stellar processor, the same way you'd would in the RecordReader
> > approach. The Stellar processor should be a parallel improvement, not a
> > conflicting one.
> >
> > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com>
> wrote:
> >
> >>  A Metron Processor itself isn’t really necessary. A MetronRecordReader
> (
> >>  either the megalithic or a reader per format ) would be a good
> approach.
> >>  Then have StellarTransformRecord processor that can do Stellar on _any_
> >>  record, regardless of source.
> >>
> >>  On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> wrote:
> >>
> >>  Thanks for the comments, Otto, this is definitely great feedback. I'd
> >>  love to respond inline, but the email's already starting to lose it's
> >>  formatting, so I'll go with the classic "wall of text". Let me know if
> I
> >>  didn't address everything.
> >>
> >>  Loading modules (or jars or whatever) outside of our Processor gives us
> >>  the benefit of making it incredibly easy for a users to create their
> own
> >>  parsers. I would definitely expect our own bundled parsers to be
> included
> >>  in our base NAR, but loading modules enables users to only have to
> learn
> >>  how Metron wants our stuff lined up and just plug it in. Having said
> that,
> >>  I could see having a wrapper for our bundled parsers that makes it
> really
> >>  easy to just say you want an MetronAsaParser or MetronBroParser, etc.
> That
> >>  would give us the best of both worlds, where it's easy to get setup our
> >>  bundled parsers and also trivial to pull in non-bundled parsers. What
> >>  doing this gives us is an easy way to support (hopefully) every parser
> that
> >>  gets made, right out of the box, without us needing to build a
> specialized
> >>  version of everything until we decide to and without users having to
> jump
> >>  through hoops.
> >>
> >>  None of this prevents anyone from creating specialized parsers (for
> perf
> >>  reasons, or to use the schema registries, or anything else). It's
> probably
> >>  worthwhile to package up some of built-in parsers and customize them
> to use
> >>  more specialized feature appropriately as we see things get used in the
> >>  wild. Like you said, we could likely provide Avro schemas for some of
> this
> >>  and give users a more robust experience on what we choose to support
> and
> >>  provide guidance for other things. I'm also worried that building
> >>  specialized schemas becomes problematic for things like parser chaining
> >>  (where our routers wrap the underlying messages and add on their own
> info).
> >>  Going down that road potentially requires anything wrapped to have a
> >>  specialized schema for the wrapped version in addition to a vanilla
> version
> >>  (although please correct me if I'm missing something there, I'll openly
> >>  admit to some shakiness on how that would be handled).
> >>
> >>  I also disagree that this is un-Nifi-like, although I'm admittedly not
> as
> >>  skilled there. The basis for doing this is directly inspired by the
> >>  JoltTransformer, which is extremely similar to the proposed setup for
> our
> >>  parsers: Simply take a spec (in this case the configs, including the
> >>  fieldTransformations), and delegate a mapping from bytes[] to JSON. The
> >>  Jolt library even has an Expression Language (check out
> >>
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> ),
> >>  so it's not a foreign concept. I believe Simon Ball has already done
> some
> >>  experimenting around with getting Stellar running in NiFi, and I'd
> love to
> >>  see Stellar more readily available in NiFi in general.
> >>
> >>  Re: the ControllerService, I see this as a way to maintain Metron's
> use of
> >>  ZK as the source of config truth. Users could definitely be using NiFi
> and
> >>  Storm in tandem (parse in NiFi + enrich and index from Storm, for
> >>  example). Using the ControllerService gives us a ZK instance as the
> single
> >>  source of truth. That way we aren't forcing users to go to two
> different
> >>  places to manage configs. This also lets us leverage our existing
> scripts
> >>  and our existing infrastructure around configs and their management and
> >>  validation very easily. It also gives users a way to port from NiFi to
> >>  Storm or vice-versa without having to migrate configs as well. We could
> >>  also provide the option to configure the Processor itself with the data
> >>  (just don't set up a controller service and provide the json or
> whatever as
> >>  one of our properties).
> >>
> >>  On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ot...@gmail.com>
> >>  wrote:
> >>
> >>>  I think this is a good idea. As I mentioned in the other thread I’ve
> >>>  been doing a lot of work on Nifi recently.
> >>>  I think the important thing is that what is done should be done the
> NiFi
> >>>  way, not bolting the Metron composition
> >>>  onto Nifi. Think of it like the Tao of Unix, the parsers and
> components
> >>>  should be single purpose and simple, allowing
> >>>  exceptional flexibility in composition.
> >>>
> >>>  Comments inline.
> >>>
> >>>  On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
> wrote:
> >>>
> >>>  Hi all,
> >>>
> >>>  There's interest in being able to run Metron parsers in NiFi, rather
> than
> >>>
> >>>  inside Storm. I dug into this a bit, and have some thoughts on how we
> >>>  could
> >>>  go about this. I'd love feedback on this, along with anything we'd
> >>>  consider must haves as well as future enhancements.
> >>>
> >>>  1. Separate metron-parsers into metron-parsers-common and metron-storm
> >>>  and create metron-parsers-nifi. For this code to be reusable across
> >>>  platforms (NiFi, Storm, and anything else in the future), we'll need
> to
> >>>  decouple our parsers and Storm.
> >>>
> >>>  +1. The “parsing code” should be a library that implements an
> interface
> >>>  ( another library ).
> >>>
> >>>  The Processors and the Storm things can share them.
> >>>
> >>>  - There's also some nice fringe benefits around refactoring our code
> >>>  to be substantially more clear and understandable; something
> >>>  which came up
> >>>  while allowing for parser aggregation.
> >>>  2. Create a MetronProcessor that can run our parsers.
> >>>  - I took a look at how RecordReader could be leveraged (e.g.
> >>>  CSVRecordReader), but this is pretty tightly tied into schemas
> >>>  and is meant
> >>>  to be used by ControllerServices, which are then used by Processors.
> >>>  There's friction involved there in terms of schemas, but also in
> terms of
> >>>
> >>>  access to ZK configs and things like parser chaining. We might
> >>>  be able to
> >>>  leverage it, but it seems like it'd be fairly shoehorned in
> >>>  without getting
> >>>  the schema and other benefits.
> >>>
> >>>  We won’t have to provide our ‘no schema processors’ ( grok, csv, json
> ).
> >>>
> >>>  All the remaining processors DO have schemas that we know about. We
> can
> >>>  just provide the avro schemas the same way we provide the ES schemas.
> >>>
> >>>  The “parsing” should not be conflated with the transform/stellar in
> >>>  NiFi. We should make that separate. Running Stellar over Records
> would be
> >>>  the best thing.
> >>>
> >>>  - This Processor would work similarly to Storm: bytes[] in -> JSON
> >>>  out.
> >>>  - There is a Processor
> >>>  <
> >>>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> >>>  >
> >>>  that
> >>>  handles loading other JARs that we can model a
> >>>  MetronParserProcessor off of
> >>>  that handles classpath/classloader issues (basically just sets up a
> >>>  classloader specific to what's being loaded and swaps out the Thread's
> >>>  loader when it calls to outside resources).
> >>>
> >>>  There should be no reason to load modules outside the NAR. Why do you
> >>>  expect to? If each Metron Processor equiv of a Metron Storm Parser is
> just
> >>>  parsing to json it shouldn’t need much.And we could package them in
> the
> >>>  NAR. I would suggest we have a Processor per Parser to allow for
> >>>  specialization. It should all be in the nar.
> >>>
> >>>  The Stellar Processor, if you would support the works would possibly
> need
> >>>  this.
> >>>
> >>>  3. Create a MetronZkControllerService to supply our configs to our
> >>>  processors.
> >>>  - This is a pretty established NiFi pattern for being able to provide
> >>>  access to other services needed by a Processor (e.g. databases or
> large
> >>>  configurations files).
> >>>  - The same controller service can be used by all Processors to manage
> >>>  configs in a consistent manner.
> >>>
> >>>  I think controller services would make sense where needed, I’m just
> not
> >>>  sure what you imagine them being needed for?
> >>>
> >>>  If the user has NiFi, and a Registry etc, are you saying you imagine
> them
> >>>  using Metron + ZK to manage configurations? Or to be using BOTH storm
> >>>  processors and Nifi Processors?
> >>>
> >>>  At that point, we can just NAR our controller service and parser
> processor
> >>>
> >>>  up as needed, deploy them to NiFi, and let the user provide a config
> for
> >>>  where their custom parsers can be provided (i.e. their parser jar).
> This
> >>>  would be 3 nars (processor, controller-service, and
> controller-service-api
> >>>
> >>>  in order to bind the other two together).
> >>>
> >>>  Once deployed, our ability to use parsers should fit well into the
> >>>  standard
> >>>  NiFi workflow:
> >>>
> >>>  1. Create a MetronZkControllerService.
> >>>  2. Configure the service to point at zookeeper.
> >>>  3. Create a MetronParser.
> >>>  4. Configure it to use the controller service + parser jar location +
> >>>  any other needed configs.
> >>>  5. Use the outputs as needed downstream (either writing out to Kafka
> or
> >>>  feeding into more MetronParsers, etc.)
> >>>
> >>>  Chaining parsers should ideally become a matter of chaining
> MetronParsers
> >>>
> >>>  (and making sure the enveloping configs carry through properly). For
> >>>  parser
> >>>  aggregation, I'd just avoid it entirely until we know it's needed in
> NiFi.
> >>>
> >>>  Justin
>
> -------------------
> Thank you,
>
> James Sirota
> PMC- Apache Metron
> jsirota AT apache DOT org
>
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by James Sirota <js...@apache.org>.

Integration with NiFi would be useful for parsing low-volume telemetries at the edge.  This is a much more resource friendly way to do it than setting up dedicated storm topologies.  The integration would be that the NiFi processor parses the data and pushes it straight into the enrichment topic, saving us the resources of having multiple parsers in storm

Thanks,
James 

07.08.2018, 11:29, "Otto Fowler" <ot...@gmail.com>:
> Why do we start over. We are going back and forth on implementation, and I
> don’t think we have the same goals or concerns.
>
> What would be the requirements or goals of metron integration with Nifi?
> How many levels or options for integration do we have?
> What are the approaches to choose from?
> Who are the target users?
>
> On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com) wrote:
>
> So how does the MetronRecordReader roll into everything? It seems like it'd
> be more useful on the reader per format approach, but otherwise it doesn't
> really seem like we gain much, and it requires getting everything linked up
> properly to be used. Assuming we looked at doing it that way, is the idea
> that we'd setup a ControllerService with the MetronRecordReader and a
> MetronRecordWriter and then have the StellarTransformRecord processor
> configured with those ControllerServices? How do we manage the
> configurations of the everything that way? How does the ControllerService
> get configured with whatever parser(s) are needed in the flow? Basically,
> what's your vision for how everything would tie together?
>
> I also forgot to mention this in the original writeup, but there's another
> reason to avoid the RecordReader: It's not considered stable. See
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34.
> That alone makes me super hesitant to use it, if it can shift out from
> under us in even in incremental version.
>
> I'm also unclear on why StellarTransformRecord processor matters for either
> approach. With the Processor approach you could simply follow it up with
> the Stellar processor, the same way you'd would in the RecordReader
> approach. The Stellar processor should be a parallel improvement, not a
> conflicting one.
>
> On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com> wrote:
>
>>  A Metron Processor itself isn’t really necessary. A MetronRecordReader (
>>  either the megalithic or a reader per format ) would be a good approach.
>>  Then have StellarTransformRecord processor that can do Stellar on _any_
>>  record, regardless of source.
>>
>>  On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com) wrote:
>>
>>  Thanks for the comments, Otto, this is definitely great feedback. I'd
>>  love to respond inline, but the email's already starting to lose it's
>>  formatting, so I'll go with the classic "wall of text". Let me know if I
>>  didn't address everything.
>>
>>  Loading modules (or jars or whatever) outside of our Processor gives us
>>  the benefit of making it incredibly easy for a users to create their own
>>  parsers. I would definitely expect our own bundled parsers to be included
>>  in our base NAR, but loading modules enables users to only have to learn
>>  how Metron wants our stuff lined up and just plug it in. Having said that,
>>  I could see having a wrapper for our bundled parsers that makes it really
>>  easy to just say you want an MetronAsaParser or MetronBroParser, etc. That
>>  would give us the best of both worlds, where it's easy to get setup our
>>  bundled parsers and also trivial to pull in non-bundled parsers. What
>>  doing this gives us is an easy way to support (hopefully) every parser that
>>  gets made, right out of the box, without us needing to build a specialized
>>  version of everything until we decide to and without users having to jump
>>  through hoops.
>>
>>  None of this prevents anyone from creating specialized parsers (for perf
>>  reasons, or to use the schema registries, or anything else). It's probably
>>  worthwhile to package up some of built-in parsers and customize them to use
>>  more specialized feature appropriately as we see things get used in the
>>  wild. Like you said, we could likely provide Avro schemas for some of this
>>  and give users a more robust experience on what we choose to support and
>>  provide guidance for other things. I'm also worried that building
>>  specialized schemas becomes problematic for things like parser chaining
>>  (where our routers wrap the underlying messages and add on their own info).
>>  Going down that road potentially requires anything wrapped to have a
>>  specialized schema for the wrapped version in addition to a vanilla version
>>  (although please correct me if I'm missing something there, I'll openly
>>  admit to some shakiness on how that would be handled).
>>
>>  I also disagree that this is un-Nifi-like, although I'm admittedly not as
>>  skilled there. The basis for doing this is directly inspired by the
>>  JoltTransformer, which is extremely similar to the proposed setup for our
>>  parsers: Simply take a spec (in this case the configs, including the
>>  fieldTransformations), and delegate a mapping from bytes[] to JSON. The
>>  Jolt library even has an Expression Language (check out
>>  https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html),
>>  so it's not a foreign concept. I believe Simon Ball has already done some
>>  experimenting around with getting Stellar running in NiFi, and I'd love to
>>  see Stellar more readily available in NiFi in general.
>>
>>  Re: the ControllerService, I see this as a way to maintain Metron's use of
>>  ZK as the source of config truth. Users could definitely be using NiFi and
>>  Storm in tandem (parse in NiFi + enrich and index from Storm, for
>>  example). Using the ControllerService gives us a ZK instance as the single
>>  source of truth. That way we aren't forcing users to go to two different
>>  places to manage configs. This also lets us leverage our existing scripts
>>  and our existing infrastructure around configs and their management and
>>  validation very easily. It also gives users a way to port from NiFi to
>>  Storm or vice-versa without having to migrate configs as well. We could
>>  also provide the option to configure the Processor itself with the data
>>  (just don't set up a controller service and provide the json or whatever as
>>  one of our properties).
>>
>>  On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ot...@gmail.com>
>>  wrote:
>>
>>>  I think this is a good idea. As I mentioned in the other thread I’ve
>>>  been doing a lot of work on Nifi recently.
>>>  I think the important thing is that what is done should be done the NiFi
>>>  way, not bolting the Metron composition
>>>  onto Nifi. Think of it like the Tao of Unix, the parsers and components
>>>  should be single purpose and simple, allowing
>>>  exceptional flexibility in composition.
>>>
>>>  Comments inline.
>>>
>>>  On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com) wrote:
>>>
>>>  Hi all,
>>>
>>>  There's interest in being able to run Metron parsers in NiFi, rather than
>>>
>>>  inside Storm. I dug into this a bit, and have some thoughts on how we
>>>  could
>>>  go about this. I'd love feedback on this, along with anything we'd
>>>  consider must haves as well as future enhancements.
>>>
>>>  1. Separate metron-parsers into metron-parsers-common and metron-storm
>>>  and create metron-parsers-nifi. For this code to be reusable across
>>>  platforms (NiFi, Storm, and anything else in the future), we'll need to
>>>  decouple our parsers and Storm.
>>>
>>>  +1. The “parsing code” should be a library that implements an interface
>>>  ( another library ).
>>>
>>>  The Processors and the Storm things can share them.
>>>
>>>  - There's also some nice fringe benefits around refactoring our code
>>>  to be substantially more clear and understandable; something
>>>  which came up
>>>  while allowing for parser aggregation.
>>>  2. Create a MetronProcessor that can run our parsers.
>>>  - I took a look at how RecordReader could be leveraged (e.g.
>>>  CSVRecordReader), but this is pretty tightly tied into schemas
>>>  and is meant
>>>  to be used by ControllerServices, which are then used by Processors.
>>>  There's friction involved there in terms of schemas, but also in terms of
>>>
>>>  access to ZK configs and things like parser chaining. We might
>>>  be able to
>>>  leverage it, but it seems like it'd be fairly shoehorned in
>>>  without getting
>>>  the schema and other benefits.
>>>
>>>  We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).
>>>
>>>  All the remaining processors DO have schemas that we know about. We can
>>>  just provide the avro schemas the same way we provide the ES schemas.
>>>
>>>  The “parsing” should not be conflated with the transform/stellar in
>>>  NiFi. We should make that separate. Running Stellar over Records would be
>>>  the best thing.
>>>
>>>  - This Processor would work similarly to Storm: bytes[] in -> JSON
>>>  out.
>>>  - There is a Processor
>>>  <
>>>  https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>>>  >
>>>  that
>>>  handles loading other JARs that we can model a
>>>  MetronParserProcessor off of
>>>  that handles classpath/classloader issues (basically just sets up a
>>>  classloader specific to what's being loaded and swaps out the Thread's
>>>  loader when it calls to outside resources).
>>>
>>>  There should be no reason to load modules outside the NAR. Why do you
>>>  expect to? If each Metron Processor equiv of a Metron Storm Parser is just
>>>  parsing to json it shouldn’t need much.And we could package them in the
>>>  NAR. I would suggest we have a Processor per Parser to allow for
>>>  specialization. It should all be in the nar.
>>>
>>>  The Stellar Processor, if you would support the works would possibly need
>>>  this.
>>>
>>>  3. Create a MetronZkControllerService to supply our configs to our
>>>  processors.
>>>  - This is a pretty established NiFi pattern for being able to provide
>>>  access to other services needed by a Processor (e.g. databases or large
>>>  configurations files).
>>>  - The same controller service can be used by all Processors to manage
>>>  configs in a consistent manner.
>>>
>>>  I think controller services would make sense where needed, I’m just not
>>>  sure what you imagine them being needed for?
>>>
>>>  If the user has NiFi, and a Registry etc, are you saying you imagine them
>>>  using Metron + ZK to manage configurations? Or to be using BOTH storm
>>>  processors and Nifi Processors?
>>>
>>>  At that point, we can just NAR our controller service and parser processor
>>>
>>>  up as needed, deploy them to NiFi, and let the user provide a config for
>>>  where their custom parsers can be provided (i.e. their parser jar). This
>>>  would be 3 nars (processor, controller-service, and controller-service-api
>>>
>>>  in order to bind the other two together).
>>>
>>>  Once deployed, our ability to use parsers should fit well into the
>>>  standard
>>>  NiFi workflow:
>>>
>>>  1. Create a MetronZkControllerService.
>>>  2. Configure the service to point at zookeeper.
>>>  3. Create a MetronParser.
>>>  4. Configure it to use the controller service + parser jar location +
>>>  any other needed configs.
>>>  5. Use the outputs as needed downstream (either writing out to Kafka or
>>>  feeding into more MetronParsers, etc.)
>>>
>>>  Chaining parsers should ideally become a matter of chaining MetronParsers
>>>
>>>  (and making sure the enveloping configs carry through properly). For
>>>  parser
>>>  aggregation, I'd just avoid it entirely until we know it's needed in NiFi.
>>>
>>>  Justin

------------------- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

Why do we start over.  We are going back and forth on implementation, and I
don’t think we have the same goals or concerns.

What would be the requirements or goals of metron integration with Nifi?
How many levels or options for integration do we have?
What are the approaches to choose from?
Who are the target users?



On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com) wrote:

So how does the MetronRecordReader roll into everything? It seems like it'd
be more useful on the reader per format approach, but otherwise it doesn't
really seem like we gain much, and it requires getting everything linked up
properly to be used. Assuming we looked at doing it that way, is the idea
that we'd setup a ControllerService with the MetronRecordReader and a
MetronRecordWriter and then have the StellarTransformRecord processor
configured with those ControllerServices? How do we manage the
configurations of the everything that way?  How does the ControllerService
get configured with whatever parser(s) are needed in the flow? Basically,
what's your vision for how everything would tie together?

I also forgot to mention this in the original writeup, but there's another
reason to avoid the RecordReader: It's not considered stable. See
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34.
That alone makes me super hesitant to use it, if it can shift out from
under us in even in incremental version.

I'm also unclear on why StellarTransformRecord processor matters for either
approach.  With the Processor approach you could simply follow it up with
the Stellar processor, the same way you'd would in the RecordReader
approach.  The Stellar processor should be a parallel improvement, not a
conflicting one.

On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com> wrote:

> A Metron Processor itself isn’t really necessary.  A MetronRecordReader (
> either the megalithic or a reader per format ) would be a good approach.
> Then have StellarTransformRecord processor that can do Stellar on _any_
> record, regardless of source.
>
>
>
>
> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com) wrote:
>
> Thanks for the comments, Otto, this is definitely great feedback.  I'd
> love to respond inline, but the email's already starting to lose it's
> formatting, so I'll go with the classic "wall of text".  Let me know if I
> didn't address everything.
>
> Loading modules (or jars or whatever) outside of our Processor gives us
> the benefit of making it incredibly easy for a users to create their own
> parsers. I would definitely expect our own bundled parsers to be included
> in our base NAR, but loading modules enables users to only have to learn
> how Metron wants our stuff lined up and just plug it in. Having said that,
> I could see having a wrapper for our bundled parsers that makes it really
> easy to just say you want an MetronAsaParser or MetronBroParser, etc. That
> would give us the best of both worlds, where it's easy to get setup our
> bundled parsers and also trivial to pull in non-bundled parsers. What
> doing this gives us is an easy way to support (hopefully) every parser that
> gets made, right out of the box, without us needing to build a specialized
> version of everything until we decide to and without users having to jump
> through hoops.
>
> None of this prevents anyone from creating specialized parsers (for perf
> reasons, or to use the schema registries, or anything else).  It's probably
> worthwhile to package up some of built-in parsers and customize them to use
> more specialized feature appropriately as we see things get used in the
> wild.  Like you said, we could likely provide Avro schemas for some of this
> and give users a more robust experience on what we choose to support and
> provide guidance for other things.  I'm also worried that building
> specialized schemas becomes problematic for things like parser chaining
> (where our routers wrap the underlying messages and add on their own info).
> Going down that road potentially requires anything wrapped to have a
> specialized schema for the wrapped version in addition to a vanilla version
> (although please correct me if I'm missing something there, I'll openly
> admit to some shakiness on how that would be handled).
>
> I also disagree that this is un-Nifi-like, although I'm admittedly not as
> skilled there.  The basis for doing this is directly inspired by the
> JoltTransformer, which is extremely similar to the proposed setup for our
> parsers: Simply take a spec (in this case the configs, including the
> fieldTransformations), and delegate a mapping from bytes[] to JSON.  The
> Jolt library even has an Expression Language (check out
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html),
> so it's not a foreign concept. I believe Simon Ball has already done some
> experimenting around with getting Stellar running in NiFi, and I'd love to
> see Stellar more readily available in NiFi in general.
>
> Re: the ControllerService, I see this as a way to maintain Metron's use of
> ZK as the source of config truth.  Users could definitely be using NiFi and
> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> example).  Using the ControllerService gives us a ZK instance as the single
> source of truth.  That way we aren't forcing users to go to two different
> places to manage configs.  This also lets us leverage our existing scripts
> and our existing infrastructure around configs and their management and
> validation very easily.  It also gives users a way to port from NiFi to
> Storm or vice-versa without having to migrate configs as well. We could
> also provide the option to configure the Processor itself with the data
> (just don't set up a controller service and provide the json or whatever as
> one of our properties).
>
>
> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ot...@gmail.com>
> wrote:
>
>> I think this is a good idea.  As I mentioned in the other thread I’ve
>> been doing a lot of work on Nifi recently.
>> I think the important thing is that what is done should be done the NiFi
>> way, not bolting the Metron composition
>> onto Nifi.  Think of it like the Tao of Unix, the parsers and components
>> should be single purpose and simple, allowing
>> exceptional flexibility in composition.
>>
>> Comments inline.
>>
>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com) wrote:
>>
>> Hi all,
>>
>> There's interest in being able to run Metron parsers in NiFi, rather than
>>
>> inside Storm. I dug into this a bit, and have some thoughts on how we
>> could
>> go about this. I'd love feedback on this, along with anything we'd
>> consider must haves as well as future enhancements.
>>
>> 1. Separate metron-parsers into metron-parsers-common and metron-storm
>> and create metron-parsers-nifi. For this code to be reusable across
>> platforms (NiFi, Storm, and anything else in the future), we'll need to
>> decouple our parsers and Storm.
>>
>> +1.  The “parsing code” should be a library that implements an interface
>> ( another library ).
>>
>> The Processors and the Storm things can share them.
>>
>>
>> - There's also some nice fringe benefits around refactoring our code
>> to be substantially more clear and understandable; something
>> which came up
>> while allowing for parser aggregation.
>> 2. Create a MetronProcessor that can run our parsers.
>> - I took a look at how RecordReader could be leveraged (e.g.
>> CSVRecordReader), but this is pretty tightly tied into schemas
>> and is meant
>> to be used by ControllerServices, which are then used by Processors.
>> There's friction involved there in terms of schemas, but also in terms of
>>
>> access to ZK configs and things like parser chaining. We might
>> be able to
>> leverage it, but it seems like it'd be fairly shoehorned in
>> without getting
>> the schema and other benefits.
>>
>> We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).
>>
>> All the remaining processors DO have schemas that we know about.  We can
>> just provide the avro schemas the same way we provide the ES schemas.
>>
>> The “parsing” should not be conflated with the transform/stellar in
>> NiFi.  We should make that separate. Running Stellar over Records would be
>> the best thing.
>>
>>
>>
>> - This Processor would work similarly to Storm: bytes[] in -> JSON
>> out.
>> - There is a Processor
>> <
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>> >
>> that
>> handles loading other JARs that we can model a
>> MetronParserProcessor off of
>> that handles classpath/classloader issues (basically just sets up a
>> classloader specific to what's being loaded and swaps out the Thread's
>> loader when it calls to outside resources).
>>
>> There should be no reason to load modules outside the NAR.  Why do you
>> expect to?  If each Metron Processor equiv of a Metron Storm Parser is just
>> parsing to json it shouldn’t need much.And we could package them in the
>> NAR.  I would suggest we have a Processor per Parser to allow for
>> specialization.  It should all be in the nar.
>>
>> The Stellar Processor, if you would support the works would possibly need
>> this.
>>
>>
>> 3. Create a MetronZkControllerService to supply our configs to our
>> processors.
>> - This is a pretty established NiFi pattern for being able to provide
>> access to other services needed by a Processor (e.g. databases or large
>> configurations files).
>> - The same controller service can be used by all Processors to manage
>> configs in a consistent manner.
>>
>> I think controller services would make sense where needed, I’m just not
>> sure what you imagine them being needed for?
>>
>> If the user has NiFi, and a Registry etc, are you saying you imagine them
>> using Metron + ZK to manage configurations?  Or to be using BOTH storm
>> processors and Nifi Processors?
>>
>>
>>
>> At that point, we can just NAR our controller service and parser processor
>>
>> up as needed, deploy them to NiFi, and let the user provide a config for
>> where their custom parsers can be provided (i.e. their parser jar). This
>> would be 3 nars (processor, controller-service, and controller-service-api
>>
>> in order to bind the other two together).
>>
>> Once deployed, our ability to use parsers should fit well into the
>> standard
>> NiFi workflow:
>>
>> 1. Create a MetronZkControllerService.
>> 2. Configure the service to point at zookeeper.
>> 3. Create a MetronParser.
>> 4. Configure it to use the controller service + parser jar location +
>> any other needed configs.
>> 5. Use the outputs as needed downstream (either writing out to Kafka or
>> feeding into more MetronParsers, etc.)
>>
>> Chaining parsers should ideally become a matter of chaining MetronParsers
>>
>> (and making sure the enveloping configs carry through properly). For
>> parser
>> aggregation, I'd just avoid it entirely until we know it's needed in NiFi.
>>
>>
>> Justin
>>
>>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Justin Leet <ju...@gmail.com>.

So how does the MetronRecordReader roll into everything? It seems like it'd
be more useful on the reader per format approach, but otherwise it doesn't
really seem like we gain much, and it requires getting everything linked up
properly to be used. Assuming we looked at doing it that way, is the idea
that we'd setup a ControllerService with the MetronRecordReader and a
MetronRecordWriter and then have the StellarTransformRecord processor
configured with those ControllerServices? How do we manage the
configurations of the everything that way?  How does the ControllerService
get configured with whatever parser(s) are needed in the flow? Basically,
what's your vision for how everything would tie together?

I also forgot to mention this in the original writeup, but there's another
reason to avoid the RecordReader: It's not considered stable. See
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34.
That alone makes me super hesitant to use it, if it can shift out from
under us in even in incremental version.

I'm also unclear on why StellarTransformRecord processor matters for either
approach.  With the Processor approach you could simply follow it up with
the Stellar processor, the same way you'd would in the RecordReader
approach.  The Stellar processor should be a parallel improvement, not a
conflicting one.

On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ot...@gmail.com> wrote:

> A Metron Processor itself isn’t really necessary.  A MetronRecordReader (
> either the megalithic or a reader per format ) would be a good approach.
> Then have StellarTransformRecord processor that can do Stellar on _any_
> record, regardless of source.
>
>
>
>
> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com) wrote:
>
> Thanks for the comments, Otto, this is definitely great feedback.  I'd
> love to respond inline, but the email's already starting to lose it's
> formatting, so I'll go with the classic "wall of text".  Let me know if I
> didn't address everything.
>
> Loading modules (or jars or whatever) outside of our Processor gives us
> the benefit of making it incredibly easy for a users to create their own
> parsers. I would definitely expect our own bundled parsers to be included
> in our base NAR, but loading modules enables users to only have to learn
> how Metron wants our stuff lined up and just plug it in. Having said that,
> I could see having a wrapper for our bundled parsers that makes it really
> easy to just say you want an MetronAsaParser or MetronBroParser, etc. That
> would give us the best of both worlds, where it's easy to get setup our
> bundled parsers and also trivial to pull in non-bundled parsers. What
> doing this gives us is an easy way to support (hopefully) every parser that
> gets made, right out of the box, without us needing to build a specialized
> version of everything until we decide to and without users having to jump
> through hoops.
>
> None of this prevents anyone from creating specialized parsers (for perf
> reasons, or to use the schema registries, or anything else).  It's probably
> worthwhile to package up some of built-in parsers and customize them to use
> more specialized feature appropriately as we see things get used in the
> wild.  Like you said, we could likely provide Avro schemas for some of this
> and give users a more robust experience on what we choose to support and
> provide guidance for other things.  I'm also worried that building
> specialized schemas becomes problematic for things like parser chaining
> (where our routers wrap the underlying messages and add on their own info).
> Going down that road potentially requires anything wrapped to have a
> specialized schema for the wrapped version in addition to a vanilla version
> (although please correct me if I'm missing something there, I'll openly
> admit to some shakiness on how that would be handled).
>
> I also disagree that this is un-Nifi-like, although I'm admittedly not as
> skilled there.  The basis for doing this is directly inspired by the
> JoltTransformer, which is extremely similar to the proposed setup for our
> parsers: Simply take a spec (in this case the configs, including the
> fieldTransformations), and delegate a mapping from bytes[] to JSON.  The
> Jolt library even has an Expression Language (check out
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html),
> so it's not a foreign concept. I believe Simon Ball has already done some
> experimenting around with getting Stellar running in NiFi, and I'd love to
> see Stellar more readily available in NiFi in general.
>
> Re: the ControllerService, I see this as a way to maintain Metron's use of
> ZK as the source of config truth.  Users could definitely be using NiFi and
> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> example).  Using the ControllerService gives us a ZK instance as the single
> source of truth.  That way we aren't forcing users to go to two different
> places to manage configs.  This also lets us leverage our existing scripts
> and our existing infrastructure around configs and their management and
> validation very easily.  It also gives users a way to port from NiFi to
> Storm or vice-versa without having to migrate configs as well. We could
> also provide the option to configure the Processor itself with the data
> (just don't set up a controller service and provide the json or whatever as
> one of our properties).
>
>
> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ot...@gmail.com>
> wrote:
>
>> I think this is a good idea.  As I mentioned in the other thread I’ve
>> been doing a lot of work on Nifi recently.
>> I think the important thing is that what is done should be done the NiFi
>> way, not bolting the Metron composition
>> onto Nifi.  Think of it like the Tao of Unix, the parsers and components
>> should be single purpose and simple, allowing
>> exceptional flexibility in composition.
>>
>> Comments inline.
>>
>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com) wrote:
>>
>> Hi all,
>>
>> There's interest in being able to run Metron parsers in NiFi, rather than
>>
>> inside Storm. I dug into this a bit, and have some thoughts on how we
>> could
>> go about this. I'd love feedback on this, along with anything we'd
>> consider must haves as well as future enhancements.
>>
>> 1. Separate metron-parsers into metron-parsers-common and metron-storm
>> and create metron-parsers-nifi. For this code to be reusable across
>> platforms (NiFi, Storm, and anything else in the future), we'll need to
>> decouple our parsers and Storm.
>>
>> +1.  The “parsing code” should be a library that implements an interface
>> ( another library ).
>>
>> The Processors and the Storm things can share them.
>>
>>
>> - There's also some nice fringe benefits around refactoring our code
>> to be substantially more clear and understandable; something
>> which came up
>> while allowing for parser aggregation.
>> 2. Create a MetronProcessor that can run our parsers.
>> - I took a look at how RecordReader could be leveraged (e.g.
>> CSVRecordReader), but this is pretty tightly tied into schemas
>> and is meant
>> to be used by ControllerServices, which are then used by Processors.
>> There's friction involved there in terms of schemas, but also in terms of
>>
>> access to ZK configs and things like parser chaining. We might
>> be able to
>> leverage it, but it seems like it'd be fairly shoehorned in
>> without getting
>> the schema and other benefits.
>>
>> We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).
>>
>> All the remaining processors DO have schemas that we know about.  We can
>> just provide the avro schemas the same way we provide the ES schemas.
>>
>> The “parsing” should not be conflated with the transform/stellar in
>> NiFi.  We should make that separate. Running Stellar over Records would be
>> the best thing.
>>
>>
>>
>> - This Processor would work similarly to Storm: bytes[] in -> JSON
>> out.
>> - There is a Processor
>> <
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>> >
>> that
>> handles loading other JARs that we can model a
>> MetronParserProcessor off of
>> that handles classpath/classloader issues (basically just sets up a
>> classloader specific to what's being loaded and swaps out the Thread's
>> loader when it calls to outside resources).
>>
>> There should be no reason to load modules outside the NAR.  Why do you
>> expect to?  If each Metron Processor equiv of a Metron Storm Parser is just
>> parsing to json it shouldn’t need much.And we could package them in the
>> NAR.  I would suggest we have a Processor per Parser to allow for
>> specialization.  It should all be in the nar.
>>
>> The Stellar Processor, if you would support the works would possibly need
>> this.
>>
>>
>> 3. Create a MetronZkControllerService to supply our configs to our
>> processors.
>> - This is a pretty established NiFi pattern for being able to provide
>> access to other services needed by a Processor (e.g. databases or large
>> configurations files).
>> - The same controller service can be used by all Processors to manage
>> configs in a consistent manner.
>>
>> I think controller services would make sense where needed, I’m just not
>> sure what you imagine them being needed for?
>>
>> If the user has NiFi, and a Registry etc, are you saying you imagine them
>> using Metron + ZK to manage configurations?  Or to be using BOTH storm
>> processors and Nifi Processors?
>>
>>
>>
>> At that point, we can just NAR our controller service and parser processor
>>
>> up as needed, deploy them to NiFi, and let the user provide a config for
>> where their custom parsers can be provided (i.e. their parser jar). This
>> would be 3 nars (processor, controller-service, and controller-service-api
>>
>> in order to bind the other two together).
>>
>> Once deployed, our ability to use parsers should fit well into the
>> standard
>> NiFi workflow:
>>
>> 1. Create a MetronZkControllerService.
>> 2. Configure the service to point at zookeeper.
>> 3. Create a MetronParser.
>> 4. Configure it to use the controller service + parser jar location +
>> any other needed configs.
>> 5. Use the outputs as needed downstream (either writing out to Kafka or
>> feeding into more MetronParsers, etc.)
>>
>> Chaining parsers should ideally become a matter of chaining MetronParsers
>>
>> (and making sure the enveloping configs carry through properly). For
>> parser
>> aggregation, I'd just avoid it entirely until we know it's needed in NiFi.
>>
>>
>> Justin
>>
>>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

A Metron Processor itself isn’t really necessary.  A MetronRecordReader (
either the megalithic or a reader per format ) would be a good approach.
Then have StellarTransformRecord processor that can do Stellar on _any_
record, regardless of source.

On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com) wrote:

Thanks for the comments, Otto, this is definitely great feedback.  I'd love
to respond inline, but the email's already starting to lose it's
formatting, so I'll go with the classic "wall of text".  Let me know if I
didn't address everything.

Loading modules (or jars or whatever) outside of our Processor gives us the
benefit of making it incredibly easy for a users to create their own
parsers. I would definitely expect our own bundled parsers to be included
in our base NAR, but loading modules enables users to only have to learn
how Metron wants our stuff lined up and just plug it in. Having said that,
I could see having a wrapper for our bundled parsers that makes it really
easy to just say you want an MetronAsaParser or MetronBroParser, etc. That
would give us the best of both worlds, where it's easy to get setup our
bundled parsers and also trivial to pull in non-bundled parsers. What doing
this gives us is an easy way to support (hopefully) every parser that gets
made, right out of the box, without us needing to build a specialized
version of everything until we decide to and without users having to jump
through hoops.

None of this prevents anyone from creating specialized parsers (for perf
reasons, or to use the schema registries, or anything else).  It's probably
worthwhile to package up some of built-in parsers and customize them to use
more specialized feature appropriately as we see things get used in the
wild.  Like you said, we could likely provide Avro schemas for some of this
and give users a more robust experience on what we choose to support and
provide guidance for other things.  I'm also worried that building
specialized schemas becomes problematic for things like parser chaining
(where our routers wrap the underlying messages and add on their own info).
Going down that road potentially requires anything wrapped to have a
specialized schema for the wrapped version in addition to a vanilla version
(although please correct me if I'm missing something there, I'll openly
admit to some shakiness on how that would be handled).

I also disagree that this is un-Nifi-like, although I'm admittedly not as
skilled there.  The basis for doing this is directly inspired by the
JoltTransformer, which is extremely similar to the proposed setup for our
parsers: Simply take a spec (in this case the configs, including the
fieldTransformations), and delegate a mapping from bytes[] to JSON.  The
Jolt library even has an Expression Language (check out
https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html),
so it's not a foreign concept. I believe Simon Ball has already done some
experimenting around with getting Stellar running in NiFi, and I'd love to
see Stellar more readily available in NiFi in general.

Re: the ControllerService, I see this as a way to maintain Metron's use of
ZK as the source of config truth.  Users could definitely be using NiFi and
Storm in tandem (parse in NiFi + enrich and index from Storm, for
example).  Using the ControllerService gives us a ZK instance as the single
source of truth.  That way we aren't forcing users to go to two different
places to manage configs.  This also lets us leverage our existing scripts
and our existing infrastructure around configs and their management and
validation very easily.  It also gives users a way to port from NiFi to
Storm or vice-versa without having to migrate configs as well. We could
also provide the option to configure the Processor itself with the data
(just don't set up a controller service and provide the json or whatever as
one of our properties).

On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ot...@gmail.com> wrote:

> I think this is a good idea.  As I mentioned in the other thread I’ve been
> doing a lot of work on Nifi recently.
> I think the important thing is that what is done should be done the NiFi
> way, not bolting the Metron composition
> onto Nifi.  Think of it like the Tao of Unix, the parsers and components
> should be single purpose and simple, allowing
> exceptional flexibility in composition.
>
> Comments inline.
>
> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com) wrote:
>
> Hi all,
>
> There's interest in being able to run Metron parsers in NiFi, rather than
> inside Storm. I dug into this a bit, and have some thoughts on how we could
>
> go about this. I'd love feedback on this, along with anything we'd
> consider must haves as well as future enhancements.
>
> 1. Separate metron-parsers into metron-parsers-common and metron-storm
> and create metron-parsers-nifi. For this code to be reusable across
> platforms (NiFi, Storm, and anything else in the future), we'll need to
> decouple our parsers and Storm.
>
> +1.  The “parsing code” should be a library that implements an interface (
> another library ).
>
> The Processors and the Storm things can share them.
>
>
> - There's also some nice fringe benefits around refactoring our code
> to be substantially more clear and understandable; something
> which came up
> while allowing for parser aggregation.
> 2. Create a MetronProcessor that can run our parsers.
> - I took a look at how RecordReader could be leveraged (e.g.
> CSVRecordReader), but this is pretty tightly tied into schemas
> and is meant
> to be used by ControllerServices, which are then used by Processors.
> There's friction involved there in terms of schemas, but also in terms of
> access to ZK configs and things like parser chaining. We might
> be able to
> leverage it, but it seems like it'd be fairly shoehorned in
> without getting
> the schema and other benefits.
>
> We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).
>
> All the remaining processors DO have schemas that we know about.  We can
> just provide the avro schemas the same way we provide the ES schemas.
>
> The “parsing” should not be conflated with the transform/stellar in NiFi.
> We should make that separate. Running Stellar over Records would be the
> best thing.
>
>
>
> - This Processor would work similarly to Storm: bytes[] in -> JSON
> out.
> - There is a Processor
> <
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> >
> that
> handles loading other JARs that we can model a
> MetronParserProcessor off of
> that handles classpath/classloader issues (basically just sets up a
> classloader specific to what's being loaded and swaps out the Thread's
> loader when it calls to outside resources).
>
> There should be no reason to load modules outside the NAR.  Why do you
> expect to?  If each Metron Processor equiv of a Metron Storm Parser is just
> parsing to json it shouldn’t need much.And we could package them in the
> NAR.  I would suggest we have a Processor per Parser to allow for
> specialization.  It should all be in the nar.
>
> The Stellar Processor, if you would support the works would possibly need
> this.
>
>
> 3. Create a MetronZkControllerService to supply our configs to our
> processors.
> - This is a pretty established NiFi pattern for being able to provide
> access to other services needed by a Processor (e.g. databases or large
> configurations files).
> - The same controller service can be used by all Processors to manage
> configs in a consistent manner.
>
> I think controller services would make sense where needed, I’m just not
> sure what you imagine them being needed for?
>
> If the user has NiFi, and a Registry etc, are you saying you imagine them
> using Metron + ZK to manage configurations?  Or to be using BOTH storm
> processors and Nifi Processors?
>
>
>
> At that point, we can just NAR our controller service and parser processor
>
> up as needed, deploy them to NiFi, and let the user provide a config for
> where their custom parsers can be provided (i.e. their parser jar). This
> would be 3 nars (processor, controller-service, and controller-service-api
>
> in order to bind the other two together).
>
> Once deployed, our ability to use parsers should fit well into the standard
>
> NiFi workflow:
>
> 1. Create a MetronZkControllerService.
> 2. Configure the service to point at zookeeper.
> 3. Create a MetronParser.
> 4. Configure it to use the controller service + parser jar location +
> any other needed configs.
> 5. Use the outputs as needed downstream (either writing out to Kafka or
> feeding into more MetronParsers, etc.)
>
> Chaining parsers should ideally become a matter of chaining MetronParsers
> (and making sure the enveloping configs carry through properly). For parser
>
> aggregation, I'd just avoid it entirely until we know it's needed in NiFi.
>
>
> Justin
>
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Justin Leet <ju...@gmail.com>.

Thanks for the comments, Otto, this is definitely great feedback.  I'd love
to respond inline, but the email's already starting to lose it's
formatting, so I'll go with the classic "wall of text".  Let me know if I
didn't address everything.

Loading modules (or jars or whatever) outside of our Processor gives us the
benefit of making it incredibly easy for a users to create their own
parsers. I would definitely expect our own bundled parsers to be included
in our base NAR, but loading modules enables users to only have to learn
how Metron wants our stuff lined up and just plug it in. Having said that,
I could see having a wrapper for our bundled parsers that makes it really
easy to just say you want an MetronAsaParser or MetronBroParser, etc. That
would give us the best of both worlds, where it's easy to get setup our
bundled parsers and also trivial to pull in non-bundled parsers. What doing
this gives us is an easy way to support (hopefully) every parser that gets
made, right out of the box, without us needing to build a specialized
version of everything until we decide to and without users having to jump
through hoops.

None of this prevents anyone from creating specialized parsers (for perf
reasons, or to use the schema registries, or anything else).  It's probably
worthwhile to package up some of built-in parsers and customize them to use
more specialized feature appropriately as we see things get used in the
wild.  Like you said, we could likely provide Avro schemas for some of this
and give users a more robust experience on what we choose to support and
provide guidance for other things.  I'm also worried that building
specialized schemas becomes problematic for things like parser chaining
(where our routers wrap the underlying messages and add on their own info).
Going down that road potentially requires anything wrapped to have a
specialized schema for the wrapped version in addition to a vanilla version
(although please correct me if I'm missing something there, I'll openly
admit to some shakiness on how that would be handled).

I also disagree that this is un-Nifi-like, although I'm admittedly not as
skilled there.  The basis for doing this is directly inspired by the
JoltTransformer, which is extremely similar to the proposed setup for our
parsers: Simply take a spec (in this case the configs, including the
fieldTransformations), and delegate a mapping from bytes[] to JSON.  The
Jolt library even has an Expression Language (check out
https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html),
so it's not a foreign concept. I believe Simon Ball has already done some
experimenting around with getting Stellar running in NiFi, and I'd love to
see Stellar more readily available in NiFi in general.

Re: the ControllerService, I see this as a way to maintain Metron's use of
ZK as the source of config truth.  Users could definitely be using NiFi and
Storm in tandem (parse in NiFi + enrich and index from Storm, for
example).  Using the ControllerService gives us a ZK instance as the single
source of truth.  That way we aren't forcing users to go to two different
places to manage configs.  This also lets us leverage our existing scripts
and our existing infrastructure around configs and their management and
validation very easily.  It also gives users a way to port from NiFi to
Storm or vice-versa without having to migrate configs as well. We could
also provide the option to configure the Processor itself with the data
(just don't set up a controller service and provide the json or whatever as
one of our properties).

On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ot...@gmail.com> wrote:

> I think this is a good idea.  As I mentioned in the other thread I’ve been
> doing a lot of work on Nifi recently.
> I think the important thing is that what is done should be done the NiFi
> way, not bolting the Metron composition
> onto Nifi.  Think of it like the Tao of Unix, the parsers and components
> should be single purpose and simple, allowing
> exceptional flexibility in composition.
>
> Comments inline.
>
> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com) wrote:
>
> Hi all,
>
> There's interest in being able to run Metron parsers in NiFi, rather than
> inside Storm. I dug into this a bit, and have some thoughts on how we could
>
> go about this. I'd love feedback on this, along with anything we'd
> consider must haves as well as future enhancements.
>
> 1. Separate metron-parsers into metron-parsers-common and metron-storm
> and create metron-parsers-nifi. For this code to be reusable across
> platforms (NiFi, Storm, and anything else in the future), we'll need to
> decouple our parsers and Storm.
>
> +1.  The “parsing code” should be a library that implements an interface (
> another library ).
>
> The Processors and the Storm things can share them.
>
>
> - There's also some nice fringe benefits around refactoring our code
> to be substantially more clear and understandable; something
> which came up
> while allowing for parser aggregation.
> 2. Create a MetronProcessor that can run our parsers.
> - I took a look at how RecordReader could be leveraged (e.g.
> CSVRecordReader), but this is pretty tightly tied into schemas
> and is meant
> to be used by ControllerServices, which are then used by Processors.
> There's friction involved there in terms of schemas, but also in terms of
> access to ZK configs and things like parser chaining. We might
> be able to
> leverage it, but it seems like it'd be fairly shoehorned in
> without getting
> the schema and other benefits.
>
> We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).
>
> All the remaining processors DO have schemas that we know about.  We can
> just provide the avro schemas the same way we provide the ES schemas.
>
> The “parsing” should not be conflated with the transform/stellar in NiFi.
> We should make that separate. Running Stellar over Records would be the
> best thing.
>
>
>
> - This Processor would work similarly to Storm: bytes[] in -> JSON
> out.
> - There is a Processor
> <
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> >
> that
> handles loading other JARs that we can model a
> MetronParserProcessor off of
> that handles classpath/classloader issues (basically just sets up a
> classloader specific to what's being loaded and swaps out the Thread's
> loader when it calls to outside resources).
>
> There should be no reason to load modules outside the NAR.  Why do you
> expect to?  If each Metron Processor equiv of a Metron Storm Parser is just
> parsing to json it shouldn’t need much.And we could package them in the
> NAR.  I would suggest we have a Processor per Parser to allow for
> specialization.  It should all be in the nar.
>
> The Stellar Processor, if you would support the works would possibly need
> this.
>
>
> 3. Create a MetronZkControllerService to supply our configs to our
> processors.
> - This is a pretty established NiFi pattern for being able to provide
> access to other services needed by a Processor (e.g. databases or large
> configurations files).
> - The same controller service can be used by all Processors to manage
> configs in a consistent manner.
>
> I think controller services would make sense where needed, I’m just not
> sure what you imagine them being needed for?
>
> If the user has NiFi, and a Registry etc, are you saying you imagine them
> using Metron + ZK to manage configurations?  Or to be using BOTH storm
> processors and Nifi Processors?
>
>
>
> At that point, we can just NAR our controller service and parser processor
>
> up as needed, deploy them to NiFi, and let the user provide a config for
> where their custom parsers can be provided (i.e. their parser jar). This
> would be 3 nars (processor, controller-service, and controller-service-api
>
> in order to bind the other two together).
>
> Once deployed, our ability to use parsers should fit well into the standard
>
> NiFi workflow:
>
> 1. Create a MetronZkControllerService.
> 2. Configure the service to point at zookeeper.
> 3. Create a MetronParser.
> 4. Configure it to use the controller service + parser jar location +
> any other needed configs.
> 5. Use the outputs as needed downstream (either writing out to Kafka or
> feeding into more MetronParsers, etc.)
>
> Chaining parsers should ideally become a matter of chaining MetronParsers
> (and making sure the enveloping configs carry through properly). For parser
>
> aggregation, I'd just avoid it entirely until we know it's needed in NiFi.
>
>
> Justin
>
>

Re: [DISCUSS] Metron Parsers in Nifi

Posted by Otto Fowler <ot...@gmail.com>.

I think this is a good idea. As I mentioned in the other thread I’ve been
doing a lot of work on Nifi recently.
I think the important thing is that what is done should be done the NiFi
way, not bolting the Metron composition
onto Nifi. Think of it like the Tao of Unix, the parsers and components
should be single purpose and simple, allowing
exceptional flexibility in composition.

Comments inline.

On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com) wrote:

Hi all,

+1. The “parsing code” should be a library that implements an interface (
another library ).

The Processors and the Storm things can share them.

- There's also some nice fringe benefits around refactoring our code
to be substantially more clear and understandable; something
which came up
while allowing for parser aggregation.
2. Create a MetronProcessor that can run our parsers.
- I took a look at how RecordReader could be leveraged (e.g.
CSVRecordReader), but this is pretty tightly tied into schemas
and is meant
to be used by ControllerServices, which are then used by Processors.
There's friction involved there in terms of schemas, but also in terms of
access to ZK configs and things like parser chaining. We might
be able to
leverage it, but it seems like it'd be fairly shoehorned in
without getting
the schema and other benefits.

We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).

All the remaining processors DO have schemas that we know about. We can
just provide the avro schemas the same way we provide the ES schemas.

The “parsing” should not be conflated with the transform/stellar in NiFi.
We should make that separate. Running Stellar over Records would be the
best thing.

- This Processor would work similarly to Storm: bytes[] in -> JSON
out.
- There is a Processor
<
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>
that
handles loading other JARs that we can model a
MetronParserProcessor off of
that handles classpath/classloader issues (basically just sets up a
classloader specific to what's being loaded and swaps out the Thread's
loader when it calls to outside resources).

There should be no reason to load modules outside the NAR. Why do you
expect to? If each Metron Processor equiv of a Metron Storm Parser is just
parsing to json it shouldn’t need much.And we could package them in the
NAR. I would suggest we have a Processor per Parser to allow for
specialization. It should all be in the nar.

The Stellar Processor, if you would support the works would possibly need
this.

3. Create a MetronZkControllerService to supply our configs to our
processors.
- This is a pretty established NiFi pattern for being able to provide
access to other services needed by a Processor (e.g. databases or large
configurations files).
- The same controller service can be used by all Processors to manage
configs in a consistent manner.

I think controller services would make sense where needed, I’m just not
sure what you imagine them being needed for?

If the user has NiFi, and a Registry etc, are you saying you imagine them
using Metron + ZK to manage configurations? Or to be using BOTH storm
processors and Nifi Processors?

Once deployed, our ability to use parsers should fit well into the standard
NiFi workflow:

Justin