You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Nissim Shiman <ns...@yahoo.com.INVALID> on 2019/10/10 22:02:31 UTC

PULL ProvenanceEvent

Hello Team,

The ProvenanceEventType class does a good job capturing possible events, but the PULL event doesn't seem to fall nicely into any of the existing types.
https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
RECEIVE is the closest, but RECEIVE is passive and doesn't capture the active action of a PULL

And... maybe it would fall into FETCH, but FETCH is more focused on contents of an existing flow file being overwritten.

What does the community think about a new PULL event type, 
or
 using FETCH for PULL, and having what FETCH does now be a new event such as REUSE

NOTE: a new PULL event would have a cascading effect of many processors that currently are emitting RECEIVE's being modified to be PULL
(i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but would more accurately capture the event.

Thanks,
Nissim Shiman


Re: PULL ProvenanceEvent

Posted by Adam Taft <ad...@adamtaft.com>.
+1 Joe - this is a good compromise to keep the original API undisturbed.


On Wed, Nov 6, 2019 at 11:05 AM Joe Witt <jo...@gmail.com> wrote:

> Nissim
>
> Notionally I am saying that session.getProvenanceReporter().receive(...)
> should have an option to call
> session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
> specified it would be UNSPECIFIED.
>
> I dont think this needs to be on the flowfile attribute - it would go
> straight to the provenance event itself which is generated by the session.
>
> Thanks
> Joe
>
> On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Joe,
> >
> > Just to verify what you mean,
> >
> > You are saying that the line:
> > flowfile = session.putAttribute(flowfile, "receiveType", "active")
> >
> > could be added before
> > session.getProvenanceReporter().receive(...)
> >
> >
> > to indicate a PULL.  Is this correct?
> >
> > Thanks,
> >
> > Nissim
> >
> >
> >
> >
> >
> >
> >     On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
> > <ns...@yahoo.com.invalid> wrote:
> >
> >   Having an attribute added indicating passive/active/query for RECEIVE
> > and FETCH will work,
> >
> > but nifi attributes are stateful (i.e. they will still be on the flowfile
> > as metadata a couple of processor steps down the flow)
> >
> > Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> > new parameter for passive/active/query ?
> > (i.e. the existing message signatures, such as  [1] will remain the same,
> > but new ones will be added to handle this new parameter?
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> >
> >     On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> > joe.witt@gmail.com> wrote:
> >
> >  These distinctions may be meaningful.  Adding them as an attribute lets
> > the
> > meaning convey but not introduce complexity for the majority case which
> is
> > the distinction isnt key.
> >
> > thanks
> >
> > On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Mike,
> > > I like the QUERY type as well.  Basically a more refined PULL.  Very
> > nice.
> > >
> > >
> > > Part of the challenge of adding PULL as a type is that there are
> > currently
> > > two flavors of RECEIVEs.
> > > RECEIVE and FETCH [1]
> > >
> > > So any addition of a PULL would need a second flavor of PULL to match
> the
> > > case where a flowfile's contents are being overwritten as well (i.e. as
> > > FETCH is currently doing)
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> > >
> > >
> > > Thanks,
> > > Nissim
> > >
> > >
> > >    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > > mikerthomsen@gmail.com> wrote:
> > >
> > >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > > there
> > > are three scenarios here:
> > >
> > > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > > subscription
> > > PULL - Direct operations to seek out and fetch something in a targeted
> > > fashion. Ex. GetHttp
> > > QUERY - Go looking for the data and take what matches your search. Ex.
> > > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> > >
> > >
> > >
> > > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Joe,
> > > >
> > > >
> > > > It is hard to say how much value transit URI would bring to clarify a
> > > > RECEIVE.
> > > > For example a RECEIVE with transit URI of https:<etc.> could be
> either
> > a
> > > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > > >
> > > > but your idea of "a metadata item specifying active vs passive" is a
> > very
> > > > clever way to make this work with mimimal disruptions.
> > > >
> > > > My understanding of this is that the current receive() calls in
> > > > ProvenanceReporter [1] will remain the same, but news ones will be
> > added
> > > > with a boolean parameter reflecting if the receive is active or
> > passive.
> > > > This will allow the current list of Provenance Events [2] to remain
> the
> > > > same.  So third party/custom processors can continue working as is
> > > >
> > > > Does this sound like what you are thinking?
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > > >
> > > > [2]
> > > > apache/nifi
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Nissim
> > > >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > > joe.witt@gmail.com> wrote:
> > > >
> > > >  Nissim
> > > >
> > > > I like the idea to introduce a more refined type of event for how
> data
> > is
> > > > brought into nifi (active - PULL, passive - RECEIVE).
> > > >
> > > > That said it might be sufficient to simply have this distinction be
> on
> > > the
> > > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > > protocol utilized as mentioned in the transport URI should clarify
> this
> > > > though.
> > > >
> > > > In short - i think there is a way here that is all opt-in for
> existing
> > > > users and components.
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > >  Adam,
> > > > > good points...
> > > > > I missed a step in explaining the use case where Provenance Events
> is
> > > > > incomplete...
> > > > > Where the second nifi does a GetSFTP from the *filesytem* that the
> > > first
> > > > > nifi is located on
> > > > > So the second nifi currently sends a RECEIVE event, but there is no
> > > > > corresponding SEND event from the first nifi (nor should there be)
> > > > > If the second nifi sent a PULL event, it would be easier for a
> system
> > > > > overseer to know that there should be no corresponding SEND event
> > > > >
> > > > > Currently send/receive works well when nifi 1 does a PostHTTP and
> > nifi
> > > 2
> > > > > does a ListenHTTP, but not in the case above.
> > > > >
> > > > > The ERROR case you mention is a nice point as well, although not my
> > > > > specific issue at the moment.
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > > > adam@adamtaft.com> wrote:
> > > > >
> > > > >  > But a flowfile that was PULLed by the second nifi (from the
> first
> > > > nifi)
> > > > > will not necessarily have any provenance event generated by the
> first
> > > > nifi.
> > > > >
> > > > > Isn't this the fault of the first NiFi to fail to emit a SEND event
> > in
> > > > > response to the second NiFi's request?  In this scenario, shouldn't
> > the
> > > > > send/receive pair be:
> > > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > > > >
> > > > > What you describe is an odd use case for NiFi.  NiFi is usually not
> > in
> > > > the
> > > > > business of acting as a file server daemon in order to "send"
> > flowfiles
> > > > to
> > > > > other systems.  As you mention, HandleHttpResponse may be a lone
> wolf
> > > > > example processor which generates a SEND event whose input
> originates
> > > > from
> > > > > a "listener". [1]  The other ListenXYZ processors generally issue
> > > RECEIVE
> > > > > events because they are receiving bytes, not generating them.
> > > > >
> > > > > Are there other processors in question? Something custom? Or is
> this
> > > > > related to site-to-site transfers?
> > > > >
> > > > > I still kind of question the motive of a provenance event pair that
> > is
> > > > > trying to establish "who called who first".  Honestly just trying
> to
> > > > > understand the use case where a matching SEND/RECEIVE pair doesn't
> > give
> > > > you
> > > > > what you need.
> > > > >
> > > > > The only thing I could see would be a processor that asks for data,
> > but
> > > > > then doesn't receive it due to some error condition.  In this case,
> > > > adding
> > > > > some sort of ERROR event might be useful.  "I attempted to retrieve
> > > data
> > > > > from ${uri}, but the transfer failed because of ${error
> condition}".
> > > > That
> > > > > way, GetXYZ processors could report an error to provenance instead
> of
> > > as
> > > > a
> > > > > bulletin.
> > > > >
> > > > > If the problem is related to a processor or the framework itself
> not
> > > > > generating an event, can we just fix that function to emit SEND in
> > the
> > > > > scenario that you describe?  Changing the provenance model itself
> > > (beyond
> > > > > possibly adding an ERROR event) feels like it would be the last
> > > scenario
> > > > to
> > > > > consider.
> > > > >
> > > > > Thanks,
> > > > > Adam
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> > > <nshiman@yahoo.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > >  Adam,
> > > > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > > > A use case would be a customer that is trying to track data
> passed
> > > > > between
> > > > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > > > >
> > > > > > So a flowfile that has a SEND event on the first nifi should
> have a
> > > > > > RECEIVE event on the second nifi.
> > > > > > But a flowfile that was PULLed by the second nifi (from the first
> > > nifi)
> > > > > > will not necessarily have any provenance event generated by the
> > first
> > > > > nifi.
> > > > > >
> > > > > > (I realize that FETCH is already a "reserved word" in the current
> > > > > > ProvenanceEvents setup, so I was hoping PULL could be used
> > instead.)
> > > > > > There is another Provenance Event, ACKNOWLEDGE, which would also
> > fit
> > > > > > occasionally to this model as well (an example would be
> > > > > HandleHttpResponse
> > > > > > processor which could send this instead of SEND when responding
> to
> > a
> > > > HTTP
> > > > > > request)
> > > > > > This being said, you make an excellent point when you said
> > > > > > "However even more important to realize,
> > > > > > this change would affect many other downstream consumers of
> > > provenance
> > > > > data
> > > > > > which aren't necessarily in the stock NiFi distribution."
> > > > > > Thanks,
> > > > > > Nissim
> > > > > >
> > > > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > > > <ns...@yahoo.com.invalid> wrote:
> > > > > >
> > > > > >  Adam,
> > > > > > "Yes" to your first question and the four processor examples you
> > > > listed.
> > > > > >
> > > > > > I will need to get back to you regarding your other points.
> > > > > >
> > > > > > Thanks,
> > > > > > Nissim
> > > > > >
> > > > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > > > adam@adamtaft.com> wrote:
> > > > > >
> > > > > >  Nissim,
> > > > > >
> > > > > > Just to be clear, you are trying to distinguish between
> processors
> > > > which
> > > > > > are actively "pulling" data (GetXYZ) vs. processors which just
> > > "listen"
> > > > > for
> > > > > > data (ListenXYZ)?  Is that your basic vision?
> > > > > >
> > > > > > GetFile => PULL
> > > > > > GetHTTP => PULL
> > > > > > ListenHTTP => RECEIVE
> > > > > > ListenTCP => RECEIVE
> > > > > >
> > > > > > Could you clarify what advantages this would have in terms of
> data
> > > > > > provenance?  What would you use this new event type for
> > specifically?
> > > > > What
> > > > > > are you missing now? Do you have a use case that needs this, or
> are
> > > you
> > > > > > just generally trying to round out the provenance event types for
> > > sake
> > > > of
> > > > > > completeness?  I honestly don't know a use case where you care
> > > whether
> > > > > you
> > > > > > polled for the data or listened for it.  The provenance model
> today
> > > > just
> > > > > > cares that you received the data, not so much how you received
> it.
> > > > > >
> > > > > > You're right that this proposal will affect many processors and
> the
> > > > > > internal visualization tools, etc.  However even more important
> to
> > > > > realize,
> > > > > > this change would affect many other downstream consumers of
> > > provenance
> > > > > data
> > > > > > which aren't necessarily in the stock NiFi distribution.  For
> > > example,
> > > > > any
> > > > > > third-party/custom ReportingTask that handles provenance data
> would
> > > > need
> > > > > to
> > > > > > be updated with this change.  There's probably need for a strong
> > > vision
> > > > > to
> > > > > > help demonstrate the value for this vs. the cost of the cascading
> > > > effects
> > > > > > related to this change.
> > > > > >
> > > > > > Thanks,
> > > > > > Adam
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > > > <nshiman@yahoo.com.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Team,
> > > > > > >
> > > > > > > The ProvenanceEventType class does a good job capturing
> possible
> > > > > events,
> > > > > > > but the PULL event doesn't seem to fall nicely into any of the
> > > > existing
> > > > > > > types.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't
> > capture
> > > > the
> > > > > > > active action of a PULL
> > > > > > >
> > > > > > > And... maybe it would fall into FETCH, but FETCH is more
> focused
> > on
> > > > > > > contents of an existing flow file being overwritten.
> > > > > > >
> > > > > > > What does the community think about a new PULL event type,
> > > > > > > or
> > > > > > >  using FETCH for PULL, and having what FETCH does now be a new
> > > event
> > > > > such
> > > > > > > as REUSE
> > > > > > >
> > > > > > > NOTE: a new PULL event would have a cascading effect of many
> > > > processors
> > > > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a
> > PULL),
> > > > but
> > > > > > > would more accurately capture the event.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Nissim Shiman
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > |
> > > |
> > > |
> > > |  |  |
> > >
> > >  |
> > >
> > >  |
> > > |
> > > |  |
> > > apache/nifi
> > >
> > > Mirror of Apache NiFi. Contribute to apache/nifi development by
> creating
> > > an account on GitHub.
> > >  |
> > >
> > >  |
> > >
> > >  |
> > >
> > >
> > >
> > >
> >
>

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Joe,

Very nice...


Thanks!
Nissim
    On Wednesday, November 6, 2019, 1:05:09 PM EST, Joe Witt <jo...@gmail.com> wrote:  
 
 Nissim

Notionally I am saying that session.getProvenanceReporter().receive(...)
should have an option to call
session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
specified it would be UNSPECIFIED.

I dont think this needs to be on the flowfile attribute - it would go
straight to the provenance event itself which is generated by the session.

Thanks
Joe

On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Joe,
>
> Just to verify what you mean,
>
> You are saying that the line:
> flowfile = session.putAttribute(flowfile, "receiveType", "active")
>
> could be added before
> session.getProvenanceReporter().receive(...)
>
>
> to indicate a PULL.  Is this correct?
>
> Thanks,
>
> Nissim
>
>
>
>
>
>
>    On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
> <ns...@yahoo.com.invalid> wrote:
>
>  Having an attribute added indicating passive/active/query for RECEIVE
> and FETCH will work,
>
> but nifi attributes are stateful (i.e. they will still be on the flowfile
> as metadata a couple of processor steps down the flow)
>
> Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> new parameter for passive/active/query ?
> (i.e. the existing message signatures, such as  [1] will remain the same,
> but new ones will be added to handle this new parameter?
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
>
>    On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> joe.witt@gmail.com> wrote:
>
>  These distinctions may be meaningful.  Adding them as an attribute lets
> the
> meaning convey but not introduce complexity for the majority case which is
> the distinction isnt key.
>
> thanks
>
> On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Mike,
> > I like the QUERY type as well.  Basically a more refined PULL.  Very
> nice.
> >
> >
> > Part of the challenge of adding PULL as a type is that there are
> currently
> > two flavors of RECEIVEs.
> > RECEIVE and FETCH [1]
> >
> > So any addition of a PULL would need a second flavor of PULL to match the
> > case where a flowfile's contents are being overwritten as well (i.e. as
> > FETCH is currently doing)
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> >
> >
> > Thanks,
> > Nissim
> >
> >
> >    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > mikerthomsen@gmail.com> wrote:
> >
> >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > there
> > are three scenarios here:
> >
> > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > subscription
> > PULL - Direct operations to seek out and fetch something in a targeted
> > fashion. Ex. GetHttp
> > QUERY - Go looking for the data and take what matches your search. Ex.
> > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> >
> >
> >
> > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Joe,
> > >
> > >
> > > It is hard to say how much value transit URI would bring to clarify a
> > > RECEIVE.
> > > For example a RECEIVE with transit URI of https:<etc.> could be either
> a
> > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > >
> > > but your idea of "a metadata item specifying active vs passive" is a
> very
> > > clever way to make this work with mimimal disruptions.
> > >
> > > My understanding of this is that the current receive() calls in
> > > ProvenanceReporter [1] will remain the same, but news ones will be
> added
> > > with a boolean parameter reflecting if the receive is active or
> passive.
> > > This will allow the current list of Provenance Events [2] to remain the
> > > same.  So third party/custom processors can continue working as is
> > >
> > > Does this sound like what you are thinking?
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > >
> > > [2]
> > > apache/nifi
> > >
> > >
> > > Thanks,
> > >
> > > Nissim
> > >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > joe.witt@gmail.com> wrote:
> > >
> > >  Nissim
> > >
> > > I like the idea to introduce a more refined type of event for how data
> is
> > > brought into nifi (active - PULL, passive - RECEIVE).
> > >
> > > That said it might be sufficient to simply have this distinction be on
> > the
> > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > protocol utilized as mentioned in the transport URI should clarify this
> > > though.
> > >
> > > In short - i think there is a way here that is all opt-in for existing
> > > users and components.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > good points...
> > > > I missed a step in explaining the use case where Provenance Events is
> > > > incomplete...
> > > > Where the second nifi does a GetSFTP from the *filesytem* that the
> > first
> > > > nifi is located on
> > > > So the second nifi currently sends a RECEIVE event, but there is no
> > > > corresponding SEND event from the first nifi (nor should there be)
> > > > If the second nifi sent a PULL event, it would be easier for a system
> > > > overseer to know that there should be no corresponding SEND event
> > > >
> > > > Currently send/receive works well when nifi 1 does a PostHTTP and
> nifi
> > 2
> > > > does a ListenHTTP, but not in the case above.
> > > >
> > > > The ERROR case you mention is a nice point as well, although not my
> > > > specific issue at the moment.
> > > > Thanks,
> > > > Nissim
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  > But a flowfile that was PULLed by the second nifi (from the first
> > > nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > Isn't this the fault of the first NiFi to fail to emit a SEND event
> in
> > > > response to the second NiFi's request?  In this scenario, shouldn't
> the
> > > > send/receive pair be:
> > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > > >
> > > > What you describe is an odd use case for NiFi.  NiFi is usually not
> in
> > > the
> > > > business of acting as a file server daemon in order to "send"
> flowfiles
> > > to
> > > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > > example processor which generates a SEND event whose input originates
> > > from
> > > > a "listener". [1]  The other ListenXYZ processors generally issue
> > RECEIVE
> > > > events because they are receiving bytes, not generating them.
> > > >
> > > > Are there other processors in question? Something custom? Or is this
> > > > related to site-to-site transfers?
> > > >
> > > > I still kind of question the motive of a provenance event pair that
> is
> > > > trying to establish "who called who first".  Honestly just trying to
> > > > understand the use case where a matching SEND/RECEIVE pair doesn't
> give
> > > you
> > > > what you need.
> > > >
> > > > The only thing I could see would be a processor that asks for data,
> but
> > > > then doesn't receive it due to some error condition.  In this case,
> > > adding
> > > > some sort of ERROR event might be useful.  "I attempted to retrieve
> > data
> > > > from ${uri}, but the transfer failed because of ${error condition}".
> > > That
> > > > way, GetXYZ processors could report an error to provenance instead of
> > as
> > > a
> > > > bulletin.
> > > >
> > > > If the problem is related to a processor or the framework itself not
> > > > generating an event, can we just fix that function to emit SEND in
> the
> > > > scenario that you describe?  Changing the provenance model itself
> > (beyond
> > > > possibly adding an ERROR event) feels like it would be the last
> > scenario
> > > to
> > > > consider.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > >  Adam,
> > > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > > A use case would be a customer that is trying to track data passed
> > > > between
> > > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > > >
> > > > > So a flowfile that has a SEND event on the first nifi should have a
> > > > > RECEIVE event on the second nifi.
> > > > > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > > > will not necessarily have any provenance event generated by the
> first
> > > > nifi.
> > > > >
> > > > > (I realize that FETCH is already a "reserved word" in the current
> > > > > ProvenanceEvents setup, so I was hoping PULL could be used
> instead.)
> > > > > There is another Provenance Event, ACKNOWLEDGE, which would also
> fit
> > > > > occasionally to this model as well (an example would be
> > > > HandleHttpResponse
> > > > > processor which could send this instead of SEND when responding to
> a
> > > HTTP
> > > > > request)
> > > > > This being said, you make an excellent point when you said
> > > > > "However even more important to realize,
> > > > > this change would affect many other downstream consumers of
> > provenance
> > > > data
> > > > > which aren't necessarily in the stock NiFi distribution."
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > > <ns...@yahoo.com.invalid> wrote:
> > > > >
> > > > >  Adam,
> > > > > "Yes" to your first question and the four processor examples you
> > > listed.
> > > > >
> > > > > I will need to get back to you regarding your other points.
> > > > >
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > > adam@adamtaft.com> wrote:
> > > > >
> > > > >  Nissim,
> > > > >
> > > > > Just to be clear, you are trying to distinguish between processors
> > > which
> > > > > are actively "pulling" data (GetXYZ) vs. processors which just
> > "listen"
> > > > for
> > > > > data (ListenXYZ)?  Is that your basic vision?
> > > > >
> > > > > GetFile => PULL
> > > > > GetHTTP => PULL
> > > > > ListenHTTP => RECEIVE
> > > > > ListenTCP => RECEIVE
> > > > >
> > > > > Could you clarify what advantages this would have in terms of data
> > > > > provenance?  What would you use this new event type for
> specifically?
> > > > What
> > > > > are you missing now? Do you have a use case that needs this, or are
> > you
> > > > > just generally trying to round out the provenance event types for
> > sake
> > > of
> > > > > completeness?  I honestly don't know a use case where you care
> > whether
> > > > you
> > > > > polled for the data or listened for it.  The provenance model today
> > > just
> > > > > cares that you received the data, not so much how you received it.
> > > > >
> > > > > You're right that this proposal will affect many processors and the
> > > > > internal visualization tools, etc.  However even more important to
> > > > realize,
> > > > > this change would affect many other downstream consumers of
> > provenance
> > > > data
> > > > > which aren't necessarily in the stock NiFi distribution.  For
> > example,
> > > > any
> > > > > third-party/custom ReportingTask that handles provenance data would
> > > need
> > > > to
> > > > > be updated with this change.  There's probably need for a strong
> > vision
> > > > to
> > > > > help demonstrate the value for this vs. the cost of the cascading
> > > effects
> > > > > related to this change.
> > > > >
> > > > > Thanks,
> > > > > Adam
> > > > >
> > > > >
> > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > > <nshiman@yahoo.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hello Team,
> > > > > >
> > > > > > The ProvenanceEventType class does a good job capturing possible
> > > > events,
> > > > > > but the PULL event doesn't seem to fall nicely into any of the
> > > existing
> > > > > > types.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't
> capture
> > > the
> > > > > > active action of a PULL
> > > > > >
> > > > > > And... maybe it would fall into FETCH, but FETCH is more focused
> on
> > > > > > contents of an existing flow file being overwritten.
> > > > > >
> > > > > > What does the community think about a new PULL event type,
> > > > > > or
> > > > > >  using FETCH for PULL, and having what FETCH does now be a new
> > event
> > > > such
> > > > > > as REUSE
> > > > > >
> > > > > > NOTE: a new PULL event would have a cascading effect of many
> > > processors
> > > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a
> PULL),
> > > but
> > > > > > would more accurately capture the event.
> > > > > >
> > > > > > Thanks,
> > > > > > Nissim Shiman
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > |
> > |
> > |
> > |  |  |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > apache/nifi
> >
> > Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> > an account on GitHub.
> >  |
> >
> >  |
> >
> >  |
> >
> >
> >
> >
>
  

Re: PULL ProvenanceEvent

Posted by Joe Witt <jo...@gmail.com>.
Nissim

Notionally I am saying that session.getProvenanceReporter().receive(...)
should have an option to call
session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
specified it would be UNSPECIFIED.

I dont think this needs to be on the flowfile attribute - it would go
straight to the provenance event itself which is generated by the session.

Thanks
Joe

On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Joe,
>
> Just to verify what you mean,
>
> You are saying that the line:
> flowfile = session.putAttribute(flowfile, "receiveType", "active")
>
> could be added before
> session.getProvenanceReporter().receive(...)
>
>
> to indicate a PULL.  Is this correct?
>
> Thanks,
>
> Nissim
>
>
>
>
>
>
>     On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
> <ns...@yahoo.com.invalid> wrote:
>
>   Having an attribute added indicating passive/active/query for RECEIVE
> and FETCH will work,
>
> but nifi attributes are stateful (i.e. they will still be on the flowfile
> as metadata a couple of processor steps down the flow)
>
> Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> new parameter for passive/active/query ?
> (i.e. the existing message signatures, such as  [1] will remain the same,
> but new ones will be added to handle this new parameter?
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
>
>     On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> joe.witt@gmail.com> wrote:
>
>  These distinctions may be meaningful.  Adding them as an attribute lets
> the
> meaning convey but not introduce complexity for the majority case which is
> the distinction isnt key.
>
> thanks
>
> On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Mike,
> > I like the QUERY type as well.  Basically a more refined PULL.  Very
> nice.
> >
> >
> > Part of the challenge of adding PULL as a type is that there are
> currently
> > two flavors of RECEIVEs.
> > RECEIVE and FETCH [1]
> >
> > So any addition of a PULL would need a second flavor of PULL to match the
> > case where a flowfile's contents are being overwritten as well (i.e. as
> > FETCH is currently doing)
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> >
> >
> > Thanks,
> > Nissim
> >
> >
> >    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > mikerthomsen@gmail.com> wrote:
> >
> >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > there
> > are three scenarios here:
> >
> > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > subscription
> > PULL - Direct operations to seek out and fetch something in a targeted
> > fashion. Ex. GetHttp
> > QUERY - Go looking for the data and take what matches your search. Ex.
> > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> >
> >
> >
> > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Joe,
> > >
> > >
> > > It is hard to say how much value transit URI would bring to clarify a
> > > RECEIVE.
> > > For example a RECEIVE with transit URI of https:<etc.> could be either
> a
> > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > >
> > > but your idea of "a metadata item specifying active vs passive" is a
> very
> > > clever way to make this work with mimimal disruptions.
> > >
> > > My understanding of this is that the current receive() calls in
> > > ProvenanceReporter [1] will remain the same, but news ones will be
> added
> > > with a boolean parameter reflecting if the receive is active or
> passive.
> > > This will allow the current list of Provenance Events [2] to remain the
> > > same.  So third party/custom processors can continue working as is
> > >
> > > Does this sound like what you are thinking?
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > >
> > > [2]
> > > apache/nifi
> > >
> > >
> > > Thanks,
> > >
> > > Nissim
> > >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > joe.witt@gmail.com> wrote:
> > >
> > >  Nissim
> > >
> > > I like the idea to introduce a more refined type of event for how data
> is
> > > brought into nifi (active - PULL, passive - RECEIVE).
> > >
> > > That said it might be sufficient to simply have this distinction be on
> > the
> > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > protocol utilized as mentioned in the transport URI should clarify this
> > > though.
> > >
> > > In short - i think there is a way here that is all opt-in for existing
> > > users and components.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > good points...
> > > > I missed a step in explaining the use case where Provenance Events is
> > > > incomplete...
> > > > Where the second nifi does a GetSFTP from the *filesytem* that the
> > first
> > > > nifi is located on
> > > > So the second nifi currently sends a RECEIVE event, but there is no
> > > > corresponding SEND event from the first nifi (nor should there be)
> > > > If the second nifi sent a PULL event, it would be easier for a system
> > > > overseer to know that there should be no corresponding SEND event
> > > >
> > > > Currently send/receive works well when nifi 1 does a PostHTTP and
> nifi
> > 2
> > > > does a ListenHTTP, but not in the case above.
> > > >
> > > > The ERROR case you mention is a nice point as well, although not my
> > > > specific issue at the moment.
> > > > Thanks,
> > > > Nissim
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  > But a flowfile that was PULLed by the second nifi (from the first
> > > nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > Isn't this the fault of the first NiFi to fail to emit a SEND event
> in
> > > > response to the second NiFi's request?  In this scenario, shouldn't
> the
> > > > send/receive pair be:
> > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > > >
> > > > What you describe is an odd use case for NiFi.  NiFi is usually not
> in
> > > the
> > > > business of acting as a file server daemon in order to "send"
> flowfiles
> > > to
> > > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > > example processor which generates a SEND event whose input originates
> > > from
> > > > a "listener". [1]  The other ListenXYZ processors generally issue
> > RECEIVE
> > > > events because they are receiving bytes, not generating them.
> > > >
> > > > Are there other processors in question? Something custom? Or is this
> > > > related to site-to-site transfers?
> > > >
> > > > I still kind of question the motive of a provenance event pair that
> is
> > > > trying to establish "who called who first".  Honestly just trying to
> > > > understand the use case where a matching SEND/RECEIVE pair doesn't
> give
> > > you
> > > > what you need.
> > > >
> > > > The only thing I could see would be a processor that asks for data,
> but
> > > > then doesn't receive it due to some error condition.  In this case,
> > > adding
> > > > some sort of ERROR event might be useful.  "I attempted to retrieve
> > data
> > > > from ${uri}, but the transfer failed because of ${error condition}".
> > > That
> > > > way, GetXYZ processors could report an error to provenance instead of
> > as
> > > a
> > > > bulletin.
> > > >
> > > > If the problem is related to a processor or the framework itself not
> > > > generating an event, can we just fix that function to emit SEND in
> the
> > > > scenario that you describe?  Changing the provenance model itself
> > (beyond
> > > > possibly adding an ERROR event) feels like it would be the last
> > scenario
> > > to
> > > > consider.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > >  Adam,
> > > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > > A use case would be a customer that is trying to track data passed
> > > > between
> > > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > > >
> > > > > So a flowfile that has a SEND event on the first nifi should have a
> > > > > RECEIVE event on the second nifi.
> > > > > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > > > will not necessarily have any provenance event generated by the
> first
> > > > nifi.
> > > > >
> > > > > (I realize that FETCH is already a "reserved word" in the current
> > > > > ProvenanceEvents setup, so I was hoping PULL could be used
> instead.)
> > > > > There is another Provenance Event, ACKNOWLEDGE, which would also
> fit
> > > > > occasionally to this model as well (an example would be
> > > > HandleHttpResponse
> > > > > processor which could send this instead of SEND when responding to
> a
> > > HTTP
> > > > > request)
> > > > > This being said, you make an excellent point when you said
> > > > > "However even more important to realize,
> > > > > this change would affect many other downstream consumers of
> > provenance
> > > > data
> > > > > which aren't necessarily in the stock NiFi distribution."
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > > <ns...@yahoo.com.invalid> wrote:
> > > > >
> > > > >  Adam,
> > > > > "Yes" to your first question and the four processor examples you
> > > listed.
> > > > >
> > > > > I will need to get back to you regarding your other points.
> > > > >
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > > adam@adamtaft.com> wrote:
> > > > >
> > > > >  Nissim,
> > > > >
> > > > > Just to be clear, you are trying to distinguish between processors
> > > which
> > > > > are actively "pulling" data (GetXYZ) vs. processors which just
> > "listen"
> > > > for
> > > > > data (ListenXYZ)?  Is that your basic vision?
> > > > >
> > > > > GetFile => PULL
> > > > > GetHTTP => PULL
> > > > > ListenHTTP => RECEIVE
> > > > > ListenTCP => RECEIVE
> > > > >
> > > > > Could you clarify what advantages this would have in terms of data
> > > > > provenance?  What would you use this new event type for
> specifically?
> > > > What
> > > > > are you missing now? Do you have a use case that needs this, or are
> > you
> > > > > just generally trying to round out the provenance event types for
> > sake
> > > of
> > > > > completeness?  I honestly don't know a use case where you care
> > whether
> > > > you
> > > > > polled for the data or listened for it.  The provenance model today
> > > just
> > > > > cares that you received the data, not so much how you received it.
> > > > >
> > > > > You're right that this proposal will affect many processors and the
> > > > > internal visualization tools, etc.  However even more important to
> > > > realize,
> > > > > this change would affect many other downstream consumers of
> > provenance
> > > > data
> > > > > which aren't necessarily in the stock NiFi distribution.  For
> > example,
> > > > any
> > > > > third-party/custom ReportingTask that handles provenance data would
> > > need
> > > > to
> > > > > be updated with this change.  There's probably need for a strong
> > vision
> > > > to
> > > > > help demonstrate the value for this vs. the cost of the cascading
> > > effects
> > > > > related to this change.
> > > > >
> > > > > Thanks,
> > > > > Adam
> > > > >
> > > > >
> > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > > <nshiman@yahoo.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hello Team,
> > > > > >
> > > > > > The ProvenanceEventType class does a good job capturing possible
> > > > events,
> > > > > > but the PULL event doesn't seem to fall nicely into any of the
> > > existing
> > > > > > types.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't
> capture
> > > the
> > > > > > active action of a PULL
> > > > > >
> > > > > > And... maybe it would fall into FETCH, but FETCH is more focused
> on
> > > > > > contents of an existing flow file being overwritten.
> > > > > >
> > > > > > What does the community think about a new PULL event type,
> > > > > > or
> > > > > >  using FETCH for PULL, and having what FETCH does now be a new
> > event
> > > > such
> > > > > > as REUSE
> > > > > >
> > > > > > NOTE: a new PULL event would have a cascading effect of many
> > > processors
> > > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a
> PULL),
> > > but
> > > > > > would more accurately capture the event.
> > > > > >
> > > > > > Thanks,
> > > > > > Nissim Shiman
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > |
> > |
> > |
> > |  |  |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > apache/nifi
> >
> > Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> > an account on GitHub.
> >  |
> >
> >  |
> >
> >  |
> >
> >
> >
> >
>

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Joe,

Just to verify what you mean,

You are saying that the line:
flowfile = session.putAttribute(flowfile, "receiveType", "active")

could be added before
session.getProvenanceReporter().receive(...)


to indicate a PULL.  Is this correct?

Thanks,

Nissim






    On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman <ns...@yahoo.com.invalid> wrote:  
 
  Having an attribute added indicating passive/active/query for RECEIVE and FETCH will work, 

but nifi attributes are stateful (i.e. they will still be on the flowfile as metadata a couple of processor steps down the flow)

Maybe an option is to expand the the api for RECEIVE and FETCH for with a new parameter for passive/active/query ?
(i.e. the existing message signatures, such as  [1] will remain the same, but new ones will be added to handle this new parameter?

[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46


    On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <jo...@gmail.com> wrote:  
 
 These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
>    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https:<etc.> could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.witt@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > adam@adamtaft.com> wrote:
> > >
> > >  > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > > response to the second NiFi's request?  In this scenario, shouldn't the
> > > send/receive pair be:
> > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > >
> > > What you describe is an odd use case for NiFi.  NiFi is usually not in
> > the
> > > business of acting as a file server daemon in order to "send" flowfiles
> > to
> > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > example processor which generates a SEND event whose input originates
> > from
> > > a "listener". [1]  The other ListenXYZ processors generally issue
> RECEIVE
> > > events because they are receiving bytes, not generating them.
> > >
> > > Are there other processors in question? Something custom? Or is this
> > > related to site-to-site transfers?
> > >
> > > I still kind of question the motive of a provenance event pair that is
> > > trying to establish "who called who first".  Honestly just trying to
> > > understand the use case where a matching SEND/RECEIVE pair doesn't give
> > you
> > > what you need.
> > >
> > > The only thing I could see would be a processor that asks for data, but
> > > then doesn't receive it due to some error condition.  In this case,
> > adding
> > > some sort of ERROR event might be useful.  "I attempted to retrieve
> data
> > > from ${uri}, but the transfer failed because of ${error condition}".
> > That
> > > way, GetXYZ processors could report an error to provenance instead of
> as
> > a
> > > bulletin.
> > >
> > > If the problem is related to a processor or the framework itself not
> > > generating an event, can we just fix that function to emit SEND in the
> > > scenario that you describe?  Changing the provenance model itself
> (beyond
> > > possibly adding an ERROR event) feels like it would be the last
> scenario
> > to
> > > consider.
> > >
> > > Thanks,
> > > Adam
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > A use case would be a customer that is trying to track data passed
> > > between
> > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > >
> > > > So a flowfile that has a SEND event on the first nifi should have a
> > > > RECEIVE event on the second nifi.
> > > > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > (I realize that FETCH is already a "reserved word" in the current
> > > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > > occasionally to this model as well (an example would be
> > > HandleHttpResponse
> > > > processor which could send this instead of SEND when responding to a
> > HTTP
> > > > request)
> > > > This being said, you make an excellent point when you said
> > > > "However even more important to realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution."
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > <ns...@yahoo.com.invalid> wrote:
> > > >
> > > >  Adam,
> > > > "Yes" to your first question and the four processor examples you
> > listed.
> > > >
> > > > I will need to get back to you regarding your other points.
> > > >
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  Nissim,
> > > >
> > > > Just to be clear, you are trying to distinguish between processors
> > which
> > > > are actively "pulling" data (GetXYZ) vs. processors which just
> "listen"
> > > for
> > > > data (ListenXYZ)?  Is that your basic vision?
> > > >
> > > > GetFile => PULL
> > > > GetHTTP => PULL
> > > > ListenHTTP => RECEIVE
> > > > ListenTCP => RECEIVE
> > > >
> > > > Could you clarify what advantages this would have in terms of data
> > > > provenance?  What would you use this new event type for specifically?
> > > What
> > > > are you missing now? Do you have a use case that needs this, or are
> you
> > > > just generally trying to round out the provenance event types for
> sake
> > of
> > > > completeness?  I honestly don't know a use case where you care
> whether
> > > you
> > > > polled for the data or listened for it.  The provenance model today
> > just
> > > > cares that you received the data, not so much how you received it.
> > > >
> > > > You're right that this proposal will affect many processors and the
> > > > internal visualization tools, etc.  However even more important to
> > > realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution.  For
> example,
> > > any
> > > > third-party/custom ReportingTask that handles provenance data would
> > need
> > > to
> > > > be updated with this change.  There's probably need for a strong
> vision
> > > to
> > > > help demonstrate the value for this vs. the cost of the cascading
> > effects
> > > > related to this change.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > >
> > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > The ProvenanceEventType class does a good job capturing possible
> > > events,
> > > > > but the PULL event doesn't seem to fall nicely into any of the
> > existing
> > > > > types.
> > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> > the
> > > > > active action of a PULL
> > > > >
> > > > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > > > contents of an existing flow file being overwritten.
> > > > >
> > > > > What does the community think about a new PULL event type,
> > > > > or
> > > > >  using FETCH for PULL, and having what FETCH does now be a new
> event
> > > such
> > > > > as REUSE
> > > > >
> > > > > NOTE: a new PULL event would have a cascading effect of many
> > processors
> > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> > but
> > > > > would more accurately capture the event.
> > > > >
> > > > > Thanks,
> > > > > Nissim Shiman
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> |
> |
> |
> |  |  |
>
>  |
>
>  |
> |
> |  |
> apache/nifi
>
> Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> an account on GitHub.
>  |
>
>  |
>
>  |
>
>
>
>
    

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Having an attribute added indicating passive/active/query for RECEIVE and FETCH will work, 

but nifi attributes are stateful (i.e. they will still be on the flowfile as metadata a couple of processor steps down the flow)

Maybe an option is to expand the the api for RECEIVE and FETCH for with a new parameter for passive/active/query ?
(i.e. the existing message signatures, such as  [1] will remain the same, but new ones will be added to handle this new parameter?

[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46


    On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <jo...@gmail.com> wrote:  
 
 These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
>    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https:<etc.> could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.witt@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > adam@adamtaft.com> wrote:
> > >
> > >  > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > > response to the second NiFi's request?  In this scenario, shouldn't the
> > > send/receive pair be:
> > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > >
> > > What you describe is an odd use case for NiFi.  NiFi is usually not in
> > the
> > > business of acting as a file server daemon in order to "send" flowfiles
> > to
> > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > example processor which generates a SEND event whose input originates
> > from
> > > a "listener". [1]  The other ListenXYZ processors generally issue
> RECEIVE
> > > events because they are receiving bytes, not generating them.
> > >
> > > Are there other processors in question? Something custom? Or is this
> > > related to site-to-site transfers?
> > >
> > > I still kind of question the motive of a provenance event pair that is
> > > trying to establish "who called who first".  Honestly just trying to
> > > understand the use case where a matching SEND/RECEIVE pair doesn't give
> > you
> > > what you need.
> > >
> > > The only thing I could see would be a processor that asks for data, but
> > > then doesn't receive it due to some error condition.  In this case,
> > adding
> > > some sort of ERROR event might be useful.  "I attempted to retrieve
> data
> > > from ${uri}, but the transfer failed because of ${error condition}".
> > That
> > > way, GetXYZ processors could report an error to provenance instead of
> as
> > a
> > > bulletin.
> > >
> > > If the problem is related to a processor or the framework itself not
> > > generating an event, can we just fix that function to emit SEND in the
> > > scenario that you describe?  Changing the provenance model itself
> (beyond
> > > possibly adding an ERROR event) feels like it would be the last
> scenario
> > to
> > > consider.
> > >
> > > Thanks,
> > > Adam
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > A use case would be a customer that is trying to track data passed
> > > between
> > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > >
> > > > So a flowfile that has a SEND event on the first nifi should have a
> > > > RECEIVE event on the second nifi.
> > > > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > (I realize that FETCH is already a "reserved word" in the current
> > > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > > occasionally to this model as well (an example would be
> > > HandleHttpResponse
> > > > processor which could send this instead of SEND when responding to a
> > HTTP
> > > > request)
> > > > This being said, you make an excellent point when you said
> > > > "However even more important to realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution."
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > <ns...@yahoo.com.invalid> wrote:
> > > >
> > > >  Adam,
> > > > "Yes" to your first question and the four processor examples you
> > listed.
> > > >
> > > > I will need to get back to you regarding your other points.
> > > >
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  Nissim,
> > > >
> > > > Just to be clear, you are trying to distinguish between processors
> > which
> > > > are actively "pulling" data (GetXYZ) vs. processors which just
> "listen"
> > > for
> > > > data (ListenXYZ)?  Is that your basic vision?
> > > >
> > > > GetFile => PULL
> > > > GetHTTP => PULL
> > > > ListenHTTP => RECEIVE
> > > > ListenTCP => RECEIVE
> > > >
> > > > Could you clarify what advantages this would have in terms of data
> > > > provenance?  What would you use this new event type for specifically?
> > > What
> > > > are you missing now? Do you have a use case that needs this, or are
> you
> > > > just generally trying to round out the provenance event types for
> sake
> > of
> > > > completeness?  I honestly don't know a use case where you care
> whether
> > > you
> > > > polled for the data or listened for it.  The provenance model today
> > just
> > > > cares that you received the data, not so much how you received it.
> > > >
> > > > You're right that this proposal will affect many processors and the
> > > > internal visualization tools, etc.  However even more important to
> > > realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution.  For
> example,
> > > any
> > > > third-party/custom ReportingTask that handles provenance data would
> > need
> > > to
> > > > be updated with this change.  There's probably need for a strong
> vision
> > > to
> > > > help demonstrate the value for this vs. the cost of the cascading
> > effects
> > > > related to this change.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > >
> > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > The ProvenanceEventType class does a good job capturing possible
> > > events,
> > > > > but the PULL event doesn't seem to fall nicely into any of the
> > existing
> > > > > types.
> > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> > the
> > > > > active action of a PULL
> > > > >
> > > > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > > > contents of an existing flow file being overwritten.
> > > > >
> > > > > What does the community think about a new PULL event type,
> > > > > or
> > > > >  using FETCH for PULL, and having what FETCH does now be a new
> event
> > > such
> > > > > as REUSE
> > > > >
> > > > > NOTE: a new PULL event would have a cascading effect of many
> > processors
> > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> > but
> > > > > would more accurately capture the event.
> > > > >
> > > > > Thanks,
> > > > > Nissim Shiman
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> |
> |
> |
> |  |  |
>
>  |
>
>  |
> |
> |  |
> apache/nifi
>
> Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> an account on GitHub.
>  |
>
>  |
>
>  |
>
>
>
>
  

Re: PULL ProvenanceEvent

Posted by Joe Witt <jo...@gmail.com>.
These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
>     On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https:<etc.> could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.witt@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > adam@adamtaft.com> wrote:
> > >
> > >  > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > > response to the second NiFi's request?  In this scenario, shouldn't the
> > > send/receive pair be:
> > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > >
> > > What you describe is an odd use case for NiFi.  NiFi is usually not in
> > the
> > > business of acting as a file server daemon in order to "send" flowfiles
> > to
> > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > example processor which generates a SEND event whose input originates
> > from
> > > a "listener". [1]  The other ListenXYZ processors generally issue
> RECEIVE
> > > events because they are receiving bytes, not generating them.
> > >
> > > Are there other processors in question? Something custom? Or is this
> > > related to site-to-site transfers?
> > >
> > > I still kind of question the motive of a provenance event pair that is
> > > trying to establish "who called who first".  Honestly just trying to
> > > understand the use case where a matching SEND/RECEIVE pair doesn't give
> > you
> > > what you need.
> > >
> > > The only thing I could see would be a processor that asks for data, but
> > > then doesn't receive it due to some error condition.  In this case,
> > adding
> > > some sort of ERROR event might be useful.  "I attempted to retrieve
> data
> > > from ${uri}, but the transfer failed because of ${error condition}".
> > That
> > > way, GetXYZ processors could report an error to provenance instead of
> as
> > a
> > > bulletin.
> > >
> > > If the problem is related to a processor or the framework itself not
> > > generating an event, can we just fix that function to emit SEND in the
> > > scenario that you describe?  Changing the provenance model itself
> (beyond
> > > possibly adding an ERROR event) feels like it would be the last
> scenario
> > to
> > > consider.
> > >
> > > Thanks,
> > > Adam
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > A use case would be a customer that is trying to track data passed
> > > between
> > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > >
> > > > So a flowfile that has a SEND event on the first nifi should have a
> > > > RECEIVE event on the second nifi.
> > > > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > (I realize that FETCH is already a "reserved word" in the current
> > > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > > occasionally to this model as well (an example would be
> > > HandleHttpResponse
> > > > processor which could send this instead of SEND when responding to a
> > HTTP
> > > > request)
> > > > This being said, you make an excellent point when you said
> > > > "However even more important to realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution."
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > <ns...@yahoo.com.invalid> wrote:
> > > >
> > > >  Adam,
> > > > "Yes" to your first question and the four processor examples you
> > listed.
> > > >
> > > > I will need to get back to you regarding your other points.
> > > >
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  Nissim,
> > > >
> > > > Just to be clear, you are trying to distinguish between processors
> > which
> > > > are actively "pulling" data (GetXYZ) vs. processors which just
> "listen"
> > > for
> > > > data (ListenXYZ)?  Is that your basic vision?
> > > >
> > > > GetFile => PULL
> > > > GetHTTP => PULL
> > > > ListenHTTP => RECEIVE
> > > > ListenTCP => RECEIVE
> > > >
> > > > Could you clarify what advantages this would have in terms of data
> > > > provenance?  What would you use this new event type for specifically?
> > > What
> > > > are you missing now? Do you have a use case that needs this, or are
> you
> > > > just generally trying to round out the provenance event types for
> sake
> > of
> > > > completeness?  I honestly don't know a use case where you care
> whether
> > > you
> > > > polled for the data or listened for it.  The provenance model today
> > just
> > > > cares that you received the data, not so much how you received it.
> > > >
> > > > You're right that this proposal will affect many processors and the
> > > > internal visualization tools, etc.  However even more important to
> > > realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution.  For
> example,
> > > any
> > > > third-party/custom ReportingTask that handles provenance data would
> > need
> > > to
> > > > be updated with this change.  There's probably need for a strong
> vision
> > > to
> > > > help demonstrate the value for this vs. the cost of the cascading
> > effects
> > > > related to this change.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > >
> > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > The ProvenanceEventType class does a good job capturing possible
> > > events,
> > > > > but the PULL event doesn't seem to fall nicely into any of the
> > existing
> > > > > types.
> > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> > the
> > > > > active action of a PULL
> > > > >
> > > > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > > > contents of an existing flow file being overwritten.
> > > > >
> > > > > What does the community think about a new PULL event type,
> > > > > or
> > > > >  using FETCH for PULL, and having what FETCH does now be a new
> event
> > > such
> > > > > as REUSE
> > > > >
> > > > > NOTE: a new PULL event would have a cascading effect of many
> > processors
> > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> > but
> > > > > would more accurately capture the event.
> > > > >
> > > > > Thanks,
> > > > > Nissim Shiman
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> |
> |
> |
> |  |  |
>
>  |
>
>  |
> |
> |  |
> apache/nifi
>
> Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> an account on GitHub.
>  |
>
>  |
>
>  |
>
>
>
>

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Mike,
I like the QUERY type as well.  Basically a more refined PULL.  Very nice.


Part of the challenge of adding PULL as a type is that there are currently two flavors of RECEIVEs.  
RECEIVE and FETCH [1]

So any addition of a PULL would need a second flavor of PULL to match the case where a flowfile's contents are being overwritten as well (i.e. as FETCH is currently doing)


[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42


Thanks,
Nissim


    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <mi...@gmail.com> wrote:  
 
 I like the idea of creating PULL as a type. In fact, I'd propose that there
are three scenarios here:

RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
subscription
PULL - Direct operations to seek out and fetch something in a targeted
fashion. Ex. GetHttp
QUERY - Go looking for the data and take what matches your search. Ex.
JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.



On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Joe,
>
>
> It is hard to say how much value transit URI would bring to clarify a
> RECEIVE.
> For example a RECEIVE with transit URI of https:<etc.> could be either a
> GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
>
> but your idea of "a metadata item specifying active vs passive" is a very
> clever way to make this work with mimimal disruptions.
>
> My understanding of this is that the current receive() calls in
> ProvenanceReporter [1] will remain the same, but news ones will be added
> with a boolean parameter reflecting if the receive is active or passive.
> This will allow the current list of Provenance Events [2] to remain the
> same.  So third party/custom processors can continue working as is
>
> Does this sound like what you are thinking?
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
> [2]
> apache/nifi
>
>
> Thanks,
>
> Nissim
>    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> joe.witt@gmail.com> wrote:
>
>  Nissim
>
> I like the idea to introduce a more refined type of event for how data is
> brought into nifi (active - PULL, passive - RECEIVE).
>
> That said it might be sufficient to simply have this distinction be on the
> "RECEIVE" event as a metadata item specifying active vs passive.  The
> protocol utilized as mentioned in the transport URI should clarify this
> though.
>
> In short - i think there is a way here that is all opt-in for existing
> users and components.
>
> Thanks
>
> On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Adam,
> > good points...
> > I missed a step in explaining the use case where Provenance Events is
> > incomplete...
> > Where the second nifi does a GetSFTP from the *filesytem* that the first
> > nifi is located on
> > So the second nifi currently sends a RECEIVE event, but there is no
> > corresponding SEND event from the first nifi (nor should there be)
> > If the second nifi sent a PULL event, it would be easier for a system
> > overseer to know that there should be no corresponding SEND event
> >
> > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> > does a ListenHTTP, but not in the case above.
> >
> > The ERROR case you mention is a nice point as well, although not my
> > specific issue at the moment.
> > Thanks,
> > Nissim
> >
> >
> >
> >
> >
> >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > adam@adamtaft.com> wrote:
> >
> >  > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > response to the second NiFi's request?  In this scenario, shouldn't the
> > send/receive pair be:
> > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> >
> > What you describe is an odd use case for NiFi.  NiFi is usually not in
> the
> > business of acting as a file server daemon in order to "send" flowfiles
> to
> > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > example processor which generates a SEND event whose input originates
> from
> > a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> > events because they are receiving bytes, not generating them.
> >
> > Are there other processors in question? Something custom? Or is this
> > related to site-to-site transfers?
> >
> > I still kind of question the motive of a provenance event pair that is
> > trying to establish "who called who first".  Honestly just trying to
> > understand the use case where a matching SEND/RECEIVE pair doesn't give
> you
> > what you need.
> >
> > The only thing I could see would be a processor that asks for data, but
> > then doesn't receive it due to some error condition.  In this case,
> adding
> > some sort of ERROR event might be useful.  "I attempted to retrieve data
> > from ${uri}, but the transfer failed because of ${error condition}".
> That
> > way, GetXYZ processors could report an error to provenance instead of as
> a
> > bulletin.
> >
> > If the problem is related to a processor or the framework itself not
> > generating an event, can we just fix that function to emit SEND in the
> > scenario that you describe?  Changing the provenance model itself (beyond
> > possibly adding an ERROR event) feels like it would be the last scenario
> to
> > consider.
> >
> > Thanks,
> > Adam
> >
> > [1]
> >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> >
> >
> >
> >
> > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > I believe there is a need for more detailed ProvenanceEvents.
> > > A use case would be a customer that is trying to track data passed
> > between
> > > two nifi's and trying to match up SENDs and RECEIVEs
> > >
> > > So a flowfile that has a SEND event on the first nifi should have a
> > > RECEIVE event on the second nifi.
> > > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > (I realize that FETCH is already a "reserved word" in the current
> > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > occasionally to this model as well (an example would be
> > HandleHttpResponse
> > > processor which could send this instead of SEND when responding to a
> HTTP
> > > request)
> > > This being said, you make an excellent point when you said
> > > "However even more important to realize,
> > > this change would affect many other downstream consumers of provenance
> > data
> > > which aren't necessarily in the stock NiFi distribution."
> > > Thanks,
> > > Nissim
> > >
> > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > <ns...@yahoo.com.invalid> wrote:
> > >
> > >  Adam,
> > > "Yes" to your first question and the four processor examples you
> listed.
> > >
> > > I will need to get back to you regarding your other points.
> > >
> > > Thanks,
> > > Nissim
> > >
> > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > adam@adamtaft.com> wrote:
> > >
> > >  Nissim,
> > >
> > > Just to be clear, you are trying to distinguish between processors
> which
> > > are actively "pulling" data (GetXYZ) vs. processors which just "listen"
> > for
> > > data (ListenXYZ)?  Is that your basic vision?
> > >
> > > GetFile => PULL
> > > GetHTTP => PULL
> > > ListenHTTP => RECEIVE
> > > ListenTCP => RECEIVE
> > >
> > > Could you clarify what advantages this would have in terms of data
> > > provenance?  What would you use this new event type for specifically?
> > What
> > > are you missing now? Do you have a use case that needs this, or are you
> > > just generally trying to round out the provenance event types for sake
> of
> > > completeness?  I honestly don't know a use case where you care whether
> > you
> > > polled for the data or listened for it.  The provenance model today
> just
> > > cares that you received the data, not so much how you received it.
> > >
> > > You're right that this proposal will affect many processors and the
> > > internal visualization tools, etc.  However even more important to
> > realize,
> > > this change would affect many other downstream consumers of provenance
> > data
> > > which aren't necessarily in the stock NiFi distribution.  For example,
> > any
> > > third-party/custom ReportingTask that handles provenance data would
> need
> > to
> > > be updated with this change.  There's probably need for a strong vision
> > to
> > > help demonstrate the value for this vs. the cost of the cascading
> effects
> > > related to this change.
> > >
> > > Thanks,
> > > Adam
> > >
> > >
> > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > > Hello Team,
> > > >
> > > > The ProvenanceEventType class does a good job capturing possible
> > events,
> > > > but the PULL event doesn't seem to fall nicely into any of the
> existing
> > > > types.
> > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> the
> > > > active action of a PULL
> > > >
> > > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > > contents of an existing flow file being overwritten.
> > > >
> > > > What does the community think about a new PULL event type,
> > > > or
> > > >  using FETCH for PULL, and having what FETCH does now be a new event
> > such
> > > > as REUSE
> > > >
> > > > NOTE: a new PULL event would have a cascading effect of many
> processors
> > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> but
> > > > would more accurately capture the event.
> > > >
> > > > Thanks,
> > > > Nissim Shiman
> > > >
> > > >
> > >
> >
>


| 
| 
| 
|  |  |

 |

 |
| 
|  | 
apache/nifi

Mirror of Apache NiFi. Contribute to apache/nifi development by creating an account on GitHub.
 |

 |

 |



  

Re: PULL ProvenanceEvent

Posted by Mike Thomsen <mi...@gmail.com>.
I like the idea of creating PULL as a type. In fact, I'd propose that there
are three scenarios here:

RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
subscription
PULL - Direct operations to seek out and fetch something in a targeted
fashion. Ex. GetHttp
QUERY - Go looking for the data and take what matches your search. Ex.
JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.



On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Joe,
>
>
> It is hard to say how much value transit URI would bring to clarify a
> RECEIVE.
> For example a RECEIVE with transit URI of https:<etc.> could be either a
> GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
>
> but your idea of "a metadata item specifying active vs passive" is a very
> clever way to make this work with mimimal disruptions.
>
> My understanding of this is that the current receive() calls in
> ProvenanceReporter [1] will remain the same, but news ones will be added
> with a boolean parameter reflecting if the receive is active or passive.
> This will allow the current list of Provenance Events [2] to remain the
> same.  So third party/custom processors can continue working as is
>
> Does this sound like what you are thinking?
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
> [2]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
>
>
> Thanks,
>
> Nissim
>     On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> joe.witt@gmail.com> wrote:
>
>  Nissim
>
> I like the idea to introduce a more refined type of event for how data is
> brought into nifi (active - PULL, passive - RECEIVE).
>
> That said it might be sufficient to simply have this distinction be on the
> "RECEIVE" event as a metadata item specifying active vs passive.  The
> protocol utilized as mentioned in the transport URI should clarify this
> though.
>
> In short - i think there is a way here that is all opt-in for existing
> users and components.
>
> Thanks
>
> On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Adam,
> > good points...
> > I missed a step in explaining the use case where Provenance Events is
> > incomplete...
> > Where the second nifi does a GetSFTP from the *filesytem* that the first
> > nifi is located on
> > So the second nifi currently sends a RECEIVE event, but there is no
> > corresponding SEND event from the first nifi (nor should there be)
> > If the second nifi sent a PULL event, it would be easier for a system
> > overseer to know that there should be no corresponding SEND event
> >
> > Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> > does a ListenHTTP, but not in the case above.
> >
> > The ERROR case you mention is a nice point as well, although not my
> > specific issue at the moment.
> > Thanks,
> > Nissim
> >
> >
> >
> >
> >
> >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > adam@adamtaft.com> wrote:
> >
> >  > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > response to the second NiFi's request?  In this scenario, shouldn't the
> > send/receive pair be:
> > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> >
> > What you describe is an odd use case for NiFi.  NiFi is usually not in
> the
> > business of acting as a file server daemon in order to "send" flowfiles
> to
> > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > example processor which generates a SEND event whose input originates
> from
> > a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> > events because they are receiving bytes, not generating them.
> >
> > Are there other processors in question? Something custom? Or is this
> > related to site-to-site transfers?
> >
> > I still kind of question the motive of a provenance event pair that is
> > trying to establish "who called who first".  Honestly just trying to
> > understand the use case where a matching SEND/RECEIVE pair doesn't give
> you
> > what you need.
> >
> > The only thing I could see would be a processor that asks for data, but
> > then doesn't receive it due to some error condition.  In this case,
> adding
> > some sort of ERROR event might be useful.  "I attempted to retrieve data
> > from ${uri}, but the transfer failed because of ${error condition}".
> That
> > way, GetXYZ processors could report an error to provenance instead of as
> a
> > bulletin.
> >
> > If the problem is related to a processor or the framework itself not
> > generating an event, can we just fix that function to emit SEND in the
> > scenario that you describe?  Changing the provenance model itself (beyond
> > possibly adding an ERROR event) feels like it would be the last scenario
> to
> > consider.
> >
> > Thanks,
> > Adam
> >
> > [1]
> >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> >
> >
> >
> >
> > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > I believe there is a need for more detailed ProvenanceEvents.
> > > A use case would be a customer that is trying to track data passed
> > between
> > > two nifi's and trying to match up SENDs and RECEIVEs
> > >
> > > So a flowfile that has a SEND event on the first nifi should have a
> > > RECEIVE event on the second nifi.
> > > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > (I realize that FETCH is already a "reserved word" in the current
> > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > occasionally to this model as well (an example would be
> > HandleHttpResponse
> > > processor which could send this instead of SEND when responding to a
> HTTP
> > > request)
> > > This being said, you make an excellent point when you said
> > > "However even more important to realize,
> > > this change would affect many other downstream consumers of provenance
> > data
> > > which aren't necessarily in the stock NiFi distribution."
> > > Thanks,
> > > Nissim
> > >
> > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > <ns...@yahoo.com.invalid> wrote:
> > >
> > >  Adam,
> > > "Yes" to your first question and the four processor examples you
> listed.
> > >
> > > I will need to get back to you regarding your other points.
> > >
> > > Thanks,
> > > Nissim
> > >
> > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > adam@adamtaft.com> wrote:
> > >
> > >  Nissim,
> > >
> > > Just to be clear, you are trying to distinguish between processors
> which
> > > are actively "pulling" data (GetXYZ) vs. processors which just "listen"
> > for
> > > data (ListenXYZ)?  Is that your basic vision?
> > >
> > > GetFile => PULL
> > > GetHTTP => PULL
> > > ListenHTTP => RECEIVE
> > > ListenTCP => RECEIVE
> > >
> > > Could you clarify what advantages this would have in terms of data
> > > provenance?  What would you use this new event type for specifically?
> > What
> > > are you missing now? Do you have a use case that needs this, or are you
> > > just generally trying to round out the provenance event types for sake
> of
> > > completeness?  I honestly don't know a use case where you care whether
> > you
> > > polled for the data or listened for it.  The provenance model today
> just
> > > cares that you received the data, not so much how you received it.
> > >
> > > You're right that this proposal will affect many processors and the
> > > internal visualization tools, etc.  However even more important to
> > realize,
> > > this change would affect many other downstream consumers of provenance
> > data
> > > which aren't necessarily in the stock NiFi distribution.  For example,
> > any
> > > third-party/custom ReportingTask that handles provenance data would
> need
> > to
> > > be updated with this change.  There's probably need for a strong vision
> > to
> > > help demonstrate the value for this vs. the cost of the cascading
> effects
> > > related to this change.
> > >
> > > Thanks,
> > > Adam
> > >
> > >
> > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > > Hello Team,
> > > >
> > > > The ProvenanceEventType class does a good job capturing possible
> > events,
> > > > but the PULL event doesn't seem to fall nicely into any of the
> existing
> > > > types.
> > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> the
> > > > active action of a PULL
> > > >
> > > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > > contents of an existing flow file being overwritten.
> > > >
> > > > What does the community think about a new PULL event type,
> > > > or
> > > >  using FETCH for PULL, and having what FETCH does now be a new event
> > such
> > > > as REUSE
> > > >
> > > > NOTE: a new PULL event would have a cascading effect of many
> processors
> > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> but
> > > > would more accurately capture the event.
> > > >
> > > > Thanks,
> > > > Nissim Shiman
> > > >
> > > >
> > >
> >
>

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Joe, 


It is hard to say how much value transit URI would bring to clarify a RECEIVE.
For example a RECEIVE with transit URI of https:<etc.> could be either a GetHTTP (i.e. active) or ListenHTTP (i.e. passive)

but your idea of "a metadata item specifying active vs passive" is a very clever way to make this work with mimimal disruptions.

My understanding of this is that the current receive() calls in ProvenanceReporter [1] will remain the same, but news ones will be added with a boolean parameter reflecting if the receive is active or passive.
This will allow the current list of Provenance Events [2] to remain the same.  So third party/custom processors can continue working as is

Does this sound like what you are thinking?


[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46

[2] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java


Thanks,

Nissim
    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <jo...@gmail.com> wrote:  
 
 Nissim

I like the idea to introduce a more refined type of event for how data is
brought into nifi (active - PULL, passive - RECEIVE).

That said it might be sufficient to simply have this distinction be on the
"RECEIVE" event as a metadata item specifying active vs passive.  The
protocol utilized as mentioned in the transport URI should clarify this
though.

In short - i think there is a way here that is all opt-in for existing
users and components.

Thanks

On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Adam,
> good points...
> I missed a step in explaining the use case where Provenance Events is
> incomplete...
> Where the second nifi does a GetSFTP from the *filesytem* that the first
> nifi is located on
> So the second nifi currently sends a RECEIVE event, but there is no
> corresponding SEND event from the first nifi (nor should there be)
> If the second nifi sent a PULL event, it would be easier for a system
> overseer to know that there should be no corresponding SEND event
>
> Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> does a ListenHTTP, but not in the case above.
>
> The ERROR case you mention is a nice point as well, although not my
> specific issue at the moment.
> Thanks,
> Nissim
>
>
>
>
>
>    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> adam@adamtaft.com> wrote:
>
>  > But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> Isn't this the fault of the first NiFi to fail to emit a SEND event in
> response to the second NiFi's request?  In this scenario, shouldn't the
> send/receive pair be:
> NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
>
> What you describe is an odd use case for NiFi.  NiFi is usually not in the
> business of acting as a file server daemon in order to "send" flowfiles to
> other systems.  As you mention, HandleHttpResponse may be a lone wolf
> example processor which generates a SEND event whose input originates from
> a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> events because they are receiving bytes, not generating them.
>
> Are there other processors in question? Something custom? Or is this
> related to site-to-site transfers?
>
> I still kind of question the motive of a provenance event pair that is
> trying to establish "who called who first".  Honestly just trying to
> understand the use case where a matching SEND/RECEIVE pair doesn't give you
> what you need.
>
> The only thing I could see would be a processor that asks for data, but
> then doesn't receive it due to some error condition.  In this case, adding
> some sort of ERROR event might be useful.  "I attempted to retrieve data
> from ${uri}, but the transfer failed because of ${error condition}".  That
> way, GetXYZ processors could report an error to provenance instead of as a
> bulletin.
>
> If the problem is related to a processor or the framework itself not
> generating an event, can we just fix that function to emit SEND in the
> scenario that you describe?  Changing the provenance model itself (beyond
> possibly adding an ERROR event) feels like it would be the last scenario to
> consider.
>
> Thanks,
> Adam
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
>
>
>
>
> On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Adam,
> > I believe there is a need for more detailed ProvenanceEvents.
> > A use case would be a customer that is trying to track data passed
> between
> > two nifi's and trying to match up SENDs and RECEIVEs
> >
> > So a flowfile that has a SEND event on the first nifi should have a
> > RECEIVE event on the second nifi.
> > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > (I realize that FETCH is already a "reserved word" in the current
> > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > occasionally to this model as well (an example would be
> HandleHttpResponse
> > processor which could send this instead of SEND when responding to a HTTP
> > request)
> > This being said, you make an excellent point when you said
> > "However even more important to realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution."
> > Thanks,
> > Nissim
> >
> >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > <ns...@yahoo.com.invalid> wrote:
> >
> >  Adam,
> > "Yes" to your first question and the four processor examples you listed.
> >
> > I will need to get back to you regarding your other points.
> >
> > Thanks,
> > Nissim
> >
> >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > adam@adamtaft.com> wrote:
> >
> >  Nissim,
> >
> > Just to be clear, you are trying to distinguish between processors which
> > are actively "pulling" data (GetXYZ) vs. processors which just "listen"
> for
> > data (ListenXYZ)?  Is that your basic vision?
> >
> > GetFile => PULL
> > GetHTTP => PULL
> > ListenHTTP => RECEIVE
> > ListenTCP => RECEIVE
> >
> > Could you clarify what advantages this would have in terms of data
> > provenance?  What would you use this new event type for specifically?
> What
> > are you missing now? Do you have a use case that needs this, or are you
> > just generally trying to round out the provenance event types for sake of
> > completeness?  I honestly don't know a use case where you care whether
> you
> > polled for the data or listened for it.  The provenance model today just
> > cares that you received the data, not so much how you received it.
> >
> > You're right that this proposal will affect many processors and the
> > internal visualization tools, etc.  However even more important to
> realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution.  For example,
> any
> > third-party/custom ReportingTask that handles provenance data would need
> to
> > be updated with this change.  There's probably need for a strong vision
> to
> > help demonstrate the value for this vs. the cost of the cascading effects
> > related to this change.
> >
> > Thanks,
> > Adam
> >
> >
> > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > > Hello Team,
> > >
> > > The ProvenanceEventType class does a good job capturing possible
> events,
> > > but the PULL event doesn't seem to fall nicely into any of the existing
> > > types.
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> > > active action of a PULL
> > >
> > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > contents of an existing flow file being overwritten.
> > >
> > > What does the community think about a new PULL event type,
> > > or
> > >  using FETCH for PULL, and having what FETCH does now be a new event
> such
> > > as REUSE
> > >
> > > NOTE: a new PULL event would have a cascading effect of many processors
> > > that currently are emitting RECEIVE's being modified to be PULL
> > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> > > would more accurately capture the event.
> > >
> > > Thanks,
> > > Nissim Shiman
> > >
> > >
> >
>
  

Re: PULL ProvenanceEvent

Posted by Joe Witt <jo...@gmail.com>.
Nissim

I like the idea to introduce a more refined type of event for how data is
brought into nifi (active - PULL, passive - RECEIVE).

That said it might be sufficient to simply have this distinction be on the
"RECEIVE" event as a metadata item specifying active vs passive.  The
protocol utilized as mentioned in the transport URI should clarify this
though.

In short - i think there is a way here that is all opt-in for existing
users and components.

Thanks

On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Adam,
> good points...
> I missed a step in explaining the use case where Provenance Events is
> incomplete...
> Where the second nifi does a GetSFTP from the *filesytem* that the first
> nifi is located on
> So the second nifi currently sends a RECEIVE event, but there is no
> corresponding SEND event from the first nifi (nor should there be)
> If the second nifi sent a PULL event, it would be easier for a system
> overseer to know that there should be no corresponding SEND event
>
> Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2
> does a ListenHTTP, but not in the case above.
>
> The ERROR case you mention is a nice point as well, although not my
> specific issue at the moment.
> Thanks,
> Nissim
>
>
>
>
>
>     On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> adam@adamtaft.com> wrote:
>
>  > But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> Isn't this the fault of the first NiFi to fail to emit a SEND event in
> response to the second NiFi's request?  In this scenario, shouldn't the
> send/receive pair be:
> NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
>
> What you describe is an odd use case for NiFi.  NiFi is usually not in the
> business of acting as a file server daemon in order to "send" flowfiles to
> other systems.  As you mention, HandleHttpResponse may be a lone wolf
> example processor which generates a SEND event whose input originates from
> a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
> events because they are receiving bytes, not generating them.
>
> Are there other processors in question? Something custom? Or is this
> related to site-to-site transfers?
>
> I still kind of question the motive of a provenance event pair that is
> trying to establish "who called who first".  Honestly just trying to
> understand the use case where a matching SEND/RECEIVE pair doesn't give you
> what you need.
>
> The only thing I could see would be a processor that asks for data, but
> then doesn't receive it due to some error condition.  In this case, adding
> some sort of ERROR event might be useful.  "I attempted to retrieve data
> from ${uri}, but the transfer failed because of ${error condition}".  That
> way, GetXYZ processors could report an error to provenance instead of as a
> bulletin.
>
> If the problem is related to a processor or the framework itself not
> generating an event, can we just fix that function to emit SEND in the
> scenario that you describe?  Changing the provenance model itself (beyond
> possibly adding an ERROR event) feels like it would be the last scenario to
> consider.
>
> Thanks,
> Adam
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
>
>
>
>
> On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> >  Adam,
> > I believe there is a need for more detailed ProvenanceEvents.
> > A use case would be a customer that is trying to track data passed
> between
> > two nifi's and trying to match up SENDs and RECEIVEs
> >
> > So a flowfile that has a SEND event on the first nifi should have a
> > RECEIVE event on the second nifi.
> > But a flowfile that was PULLed by the second nifi (from the first nifi)
> > will not necessarily have any provenance event generated by the first
> nifi.
> >
> > (I realize that FETCH is already a "reserved word" in the current
> > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > occasionally to this model as well (an example would be
> HandleHttpResponse
> > processor which could send this instead of SEND when responding to a HTTP
> > request)
> > This being said, you make an excellent point when you said
> > "However even more important to realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution."
> > Thanks,
> > Nissim
> >
> >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > <ns...@yahoo.com.invalid> wrote:
> >
> >  Adam,
> > "Yes" to your first question and the four processor examples you listed.
> >
> > I will need to get back to you regarding your other points.
> >
> > Thanks,
> > Nissim
> >
> >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > adam@adamtaft.com> wrote:
> >
> >  Nissim,
> >
> > Just to be clear, you are trying to distinguish between processors which
> > are actively "pulling" data (GetXYZ) vs. processors which just "listen"
> for
> > data (ListenXYZ)?  Is that your basic vision?
> >
> > GetFile => PULL
> > GetHTTP => PULL
> > ListenHTTP => RECEIVE
> > ListenTCP => RECEIVE
> >
> > Could you clarify what advantages this would have in terms of data
> > provenance?  What would you use this new event type for specifically?
> What
> > are you missing now? Do you have a use case that needs this, or are you
> > just generally trying to round out the provenance event types for sake of
> > completeness?  I honestly don't know a use case where you care whether
> you
> > polled for the data or listened for it.  The provenance model today just
> > cares that you received the data, not so much how you received it.
> >
> > You're right that this proposal will affect many processors and the
> > internal visualization tools, etc.  However even more important to
> realize,
> > this change would affect many other downstream consumers of provenance
> data
> > which aren't necessarily in the stock NiFi distribution.  For example,
> any
> > third-party/custom ReportingTask that handles provenance data would need
> to
> > be updated with this change.  There's probably need for a strong vision
> to
> > help demonstrate the value for this vs. the cost of the cascading effects
> > related to this change.
> >
> > Thanks,
> > Adam
> >
> >
> > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > > Hello Team,
> > >
> > > The ProvenanceEventType class does a good job capturing possible
> events,
> > > but the PULL event doesn't seem to fall nicely into any of the existing
> > > types.
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> > > active action of a PULL
> > >
> > > And... maybe it would fall into FETCH, but FETCH is more focused on
> > > contents of an existing flow file being overwritten.
> > >
> > > What does the community think about a new PULL event type,
> > > or
> > >  using FETCH for PULL, and having what FETCH does now be a new event
> such
> > > as REUSE
> > >
> > > NOTE: a new PULL event would have a cascading effect of many processors
> > > that currently are emitting RECEIVE's being modified to be PULL
> > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> > > would more accurately capture the event.
> > >
> > > Thanks,
> > > Nissim Shiman
> > >
> > >
> >
>

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Adam,
good points...
I missed a step in explaining the use case where Provenance Events is incomplete...
Where the second nifi does a GetSFTP from the *filesytem* that the first nifi is located on
So the second nifi currently sends a RECEIVE event, but there is no corresponding SEND event from the first nifi (nor should there be)
If the second nifi sent a PULL event, it would be easier for a system overseer to know that there should be no corresponding SEND event

Currently send/receive works well when nifi 1 does a PostHTTP and nifi 2 does a ListenHTTP, but not in the case above.

The ERROR case you mention is a nice point as well, although not my specific issue at the moment.
Thanks,
Nissim





    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <ad...@adamtaft.com> wrote:  
 
 > But a flowfile that was PULLed by the second nifi (from the first nifi)
will not necessarily have any provenance event generated by the first nifi.

Isn't this the fault of the first NiFi to fail to emit a SEND event in
response to the second NiFi's request?  In this scenario, shouldn't the
send/receive pair be:
NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?

What you describe is an odd use case for NiFi.  NiFi is usually not in the
business of acting as a file server daemon in order to "send" flowfiles to
other systems.  As you mention, HandleHttpResponse may be a lone wolf
example processor which generates a SEND event whose input originates from
a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
events because they are receiving bytes, not generating them.

Are there other processors in question? Something custom? Or is this
related to site-to-site transfers?

I still kind of question the motive of a provenance event pair that is
trying to establish "who called who first".  Honestly just trying to
understand the use case where a matching SEND/RECEIVE pair doesn't give you
what you need.

The only thing I could see would be a processor that asks for data, but
then doesn't receive it due to some error condition.  In this case, adding
some sort of ERROR event might be useful.  "I attempted to retrieve data
from ${uri}, but the transfer failed because of ${error condition}".  That
way, GetXYZ processors could report an error to provenance instead of as a
bulletin.

If the problem is related to a processor or the framework itself not
generating an event, can we just fix that function to emit SEND in the
scenario that you describe?  Changing the provenance model itself (beyond
possibly adding an ERROR event) feels like it would be the last scenario to
consider.

Thanks,
Adam

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191




On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Adam,
> I believe there is a need for more detailed ProvenanceEvents.
> A use case would be a customer that is trying to track data passed between
> two nifi's and trying to match up SENDs and RECEIVEs
>
> So a flowfile that has a SEND event on the first nifi should have a
> RECEIVE event on the second nifi.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> (I realize that FETCH is already a "reserved word" in the current
> ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> There is another Provenance Event, ACKNOWLEDGE, which would also fit
> occasionally to this model as well (an example would be HandleHttpResponse
> processor which could send this instead of SEND when responding to a HTTP
> request)
> This being said, you make an excellent point when you said
> "However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution."
> Thanks,
> Nissim
>
>    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> <ns...@yahoo.com.invalid> wrote:
>
>  Adam,
> "Yes" to your first question and the four processor examples you listed.
>
> I will need to get back to you regarding your other points.
>
> Thanks,
> Nissim
>
>    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> adam@adamtaft.com> wrote:
>
>  Nissim,
>
> Just to be clear, you are trying to distinguish between processors which
> are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
> data (ListenXYZ)?  Is that your basic vision?
>
> GetFile => PULL
> GetHTTP => PULL
> ListenHTTP => RECEIVE
> ListenTCP => RECEIVE
>
> Could you clarify what advantages this would have in terms of data
> provenance?  What would you use this new event type for specifically?  What
> are you missing now? Do you have a use case that needs this, or are you
> just generally trying to round out the provenance event types for sake of
> completeness?  I honestly don't know a use case where you care whether you
> polled for the data or listened for it.  The provenance model today just
> cares that you received the data, not so much how you received it.
>
> You're right that this proposal will affect many processors and the
> internal visualization tools, etc.  However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution.  For example, any
> third-party/custom ReportingTask that handles provenance data would need to
> be updated with this change.  There's probably need for a strong vision to
> help demonstrate the value for this vs. the cost of the cascading effects
> related to this change.
>
> Thanks,
> Adam
>
>
> On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> > Hello Team,
> >
> > The ProvenanceEventType class does a good job capturing possible events,
> > but the PULL event doesn't seem to fall nicely into any of the existing
> > types.
> >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> > active action of a PULL
> >
> > And... maybe it would fall into FETCH, but FETCH is more focused on
> > contents of an existing flow file being overwritten.
> >
> > What does the community think about a new PULL event type,
> > or
> >  using FETCH for PULL, and having what FETCH does now be a new event such
> > as REUSE
> >
> > NOTE: a new PULL event would have a cascading effect of many processors
> > that currently are emitting RECEIVE's being modified to be PULL
> > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> > would more accurately capture the event.
> >
> > Thanks,
> > Nissim Shiman
> >
> >
>
  

Re: PULL ProvenanceEvent

Posted by Adam Taft <ad...@adamtaft.com>.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
will not necessarily have any provenance event generated by the first nifi.

Isn't this the fault of the first NiFi to fail to emit a SEND event in
response to the second NiFi's request?  In this scenario, shouldn't the
send/receive pair be:
NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?

What you describe is an odd use case for NiFi.  NiFi is usually not in the
business of acting as a file server daemon in order to "send" flowfiles to
other systems.  As you mention, HandleHttpResponse may be a lone wolf
example processor which generates a SEND event whose input originates from
a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
events because they are receiving bytes, not generating them.

Are there other processors in question? Something custom? Or is this
related to site-to-site transfers?

I still kind of question the motive of a provenance event pair that is
trying to establish "who called who first".  Honestly just trying to
understand the use case where a matching SEND/RECEIVE pair doesn't give you
what you need.

The only thing I could see would be a processor that asks for data, but
then doesn't receive it due to some error condition.  In this case, adding
some sort of ERROR event might be useful.  "I attempted to retrieve data
from ${uri}, but the transfer failed because of ${error condition}".  That
way, GetXYZ processors could report an error to provenance instead of as a
bulletin.

If the problem is related to a processor or the framework itself not
generating an event, can we just fix that function to emit SEND in the
scenario that you describe?  Changing the provenance model itself (beyond
possibly adding an ERROR event) feels like it would be the last scenario to
consider.

Thanks,
Adam

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191




On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

>  Adam,
> I believe there is a need for more detailed ProvenanceEvents.
> A use case would be a customer that is trying to track data passed between
> two nifi's and trying to match up SENDs and RECEIVEs
>
> So a flowfile that has a SEND event on the first nifi should have a
> RECEIVE event on the second nifi.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> (I realize that FETCH is already a "reserved word" in the current
> ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> There is another Provenance Event, ACKNOWLEDGE, which would also fit
> occasionally to this model as well (an example would be HandleHttpResponse
> processor which could send this instead of SEND when responding to a HTTP
> request)
> This being said, you make an excellent point when you said
> "However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution."
> Thanks,
> Nissim
>
>     On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> <ns...@yahoo.com.invalid> wrote:
>
>   Adam,
> "Yes" to your first question and the four processor examples you listed.
>
> I will need to get back to you regarding your other points.
>
> Thanks,
> Nissim
>
>     On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> adam@adamtaft.com> wrote:
>
>  Nissim,
>
> Just to be clear, you are trying to distinguish between processors which
> are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
> data (ListenXYZ)?  Is that your basic vision?
>
> GetFile => PULL
> GetHTTP => PULL
> ListenHTTP => RECEIVE
> ListenTCP => RECEIVE
>
> Could you clarify what advantages this would have in terms of data
> provenance?  What would you use this new event type for specifically?  What
> are you missing now? Do you have a use case that needs this, or are you
> just generally trying to round out the provenance event types for sake of
> completeness?  I honestly don't know a use case where you care whether you
> polled for the data or listened for it.  The provenance model today just
> cares that you received the data, not so much how you received it.
>
> You're right that this proposal will affect many processors and the
> internal visualization tools, etc.  However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution.  For example, any
> third-party/custom ReportingTask that handles provenance data would need to
> be updated with this change.  There's probably need for a strong vision to
> help demonstrate the value for this vs. the cost of the cascading effects
> related to this change.
>
> Thanks,
> Adam
>
>
> On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <ns...@yahoo.com.invalid>
> wrote:
>
> > Hello Team,
> >
> > The ProvenanceEventType class does a good job capturing possible events,
> > but the PULL event doesn't seem to fall nicely into any of the existing
> > types.
> >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> > active action of a PULL
> >
> > And... maybe it would fall into FETCH, but FETCH is more focused on
> > contents of an existing flow file being overwritten.
> >
> > What does the community think about a new PULL event type,
> > or
> >  using FETCH for PULL, and having what FETCH does now be a new event such
> > as REUSE
> >
> > NOTE: a new PULL event would have a cascading effect of many processors
> > that currently are emitting RECEIVE's being modified to be PULL
> > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> > would more accurately capture the event.
> >
> > Thanks,
> > Nissim Shiman
> >
> >
>

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Adam,
I believe there is a need for more detailed ProvenanceEvents.
A use case would be a customer that is trying to track data passed between two nifi's and trying to match up SENDs and RECEIVEs

So a flowfile that has a SEND event on the first nifi should have a RECEIVE event on the second nifi.
But a flowfile that was PULLed by the second nifi (from the first nifi) will not necessarily have any provenance event generated by the first nifi.

(I realize that FETCH is already a "reserved word" in the current ProvenanceEvents setup, so I was hoping PULL could be used instead.)
There is another Provenance Event, ACKNOWLEDGE, which would also fit occasionally to this model as well (an example would be HandleHttpResponse processor which could send this instead of SEND when responding to a HTTP request)
This being said, you make an excellent point when you said
"However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution."
Thanks,
Nissim

    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman <ns...@yahoo.com.invalid> wrote:  
 
  Adam,
"Yes" to your first question and the four processor examples you listed.

I will need to get back to you regarding your other points.

Thanks,
Nissim

    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <ad...@adamtaft.com> wrote:  
 
 Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>
    

Re: PULL ProvenanceEvent

Posted by Nissim Shiman <ns...@yahoo.com.INVALID>.
 Adam,
"Yes" to your first question and the four processor examples you listed.

I will need to get back to you regarding your other points.

Thanks,
Nissim

    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <ad...@adamtaft.com> wrote:  
 
 Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>
  

Re: PULL ProvenanceEvent

Posted by Adam Taft <ad...@adamtaft.com>.
Nissim,

Just to be clear, you are trying to distinguish between processors which
are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
data (ListenXYZ)?  Is that your basic vision?

GetFile => PULL
GetHTTP => PULL
ListenHTTP => RECEIVE
ListenTCP => RECEIVE

Could you clarify what advantages this would have in terms of data
provenance?  What would you use this new event type for specifically?  What
are you missing now? Do you have a use case that needs this, or are you
just generally trying to round out the provenance event types for sake of
completeness?  I honestly don't know a use case where you care whether you
polled for the data or listened for it.  The provenance model today just
cares that you received the data, not so much how you received it.

You're right that this proposal will affect many processors and the
internal visualization tools, etc.  However even more important to realize,
this change would affect many other downstream consumers of provenance data
which aren't necessarily in the stock NiFi distribution.  For example, any
third-party/custom ReportingTask that handles provenance data would need to
be updated with this change.  There's probably need for a strong vision to
help demonstrate the value for this vs. the cost of the cascading effects
related to this change.

Thanks,
Adam


On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <ns...@yahoo.com.invalid>
wrote:

> Hello Team,
>
> The ProvenanceEventType class does a good job capturing possible events,
> but the PULL event doesn't seem to fall nicely into any of the existing
> types.
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> active action of a PULL
>
> And... maybe it would fall into FETCH, but FETCH is more focused on
> contents of an existing flow file being overwritten.
>
> What does the community think about a new PULL event type,
> or
>  using FETCH for PULL, and having what FETCH does now be a new event such
> as REUSE
>
> NOTE: a new PULL event would have a cascading effect of many processors
> that currently are emitting RECEIVE's being modified to be PULL
> (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> would more accurately capture the event.
>
> Thanks,
> Nissim Shiman
>
>