You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Ricky Saltzer <ri...@cloudera.com> on 2015/12/22 17:28:30 UTC

Input Forbidden Requirement

I noticed that in NiFi 0.4.0, a few processors (i.e. GetFile) are annotated
with @InputRequirement(Requirement.INPUT_FORBIDDEN), which is a little
confusing to me. What is the reasoning for forbidding an input connection
to processors like GetFile? There are situations where you want the ability
to trigger processors to execute, and an input connection is kind of the
only way (that I know of) to control precisely when it happens. I tested a
simple ExecuteProcess -> GetFile flow in 0.3.0, and then did an in place
upgrade to 0.4.0, which invalidated the flow. This change is pretty major,
and potentially breaks compatibility between upgrades.

ricky

Re: Input Forbidden Requirement

Posted by Joe Witt <jo...@gmail.com>.

I've updated the migration guidance to better highlight this change:
https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance

It is important to note that GetFile has never been driven by input
flow files.  The desire for that totally makes sense and that is what
the ListFile/FetchFile pattern supports which is also part of the
0.4.0 release.  I believe this handles your case with the elegance
you're looking for.

Thanks
Joe

On Tue, Dec 22, 2015 at 2:00 PM, Ian Moran <ia...@saggezza.com> wrote:
> To toss in another instance where this change can act against valid
> use-cases, consider any instance where you might want to use the Expression
> Language to cause one of these 'source' processors to not only run on a
> 'trigger', but also run more dynamically. Previously, you could potentially
> use ExtractText to set up Flowfile attributes you could then reference in
> GetFile's Input Directory property, allowing you to specifically target
> certain sub-directories for example. Without the ability to route
> attribute-laden Flowfiles to GetFile, however, you're limited to only using
> Environment variables with the expression language. Still useful in some
> instances, but certainly not as robust.
>
> I'm just thinking 'out loud' here, but instead of outright disallowing
> incoming connections to processors that are output only, would it be
> possible to give some visual indication to the user that the *content* of
> the Flowfile will be ignored, and most likely overwritten, but still allow
> them to utilize the connection to trigger the processor to run (and
> possibly deliver attributes for the Expression Language's use)?
>
> On Tue, Dec 22, 2015 at 11:53 AM, Ricky Saltzer <ri...@cloudera.com> wrote:
>
>> Aha, that makes sense. Thanks for the explanation! I do agree that it could
>> be useful to have control over when those types of processors execute.
>>
>
>
>
> --
>
> *Ian Moran | **Associate Software Developer** | Saggezza*
>
>
>
> Email: *Ian.Moran@Saggezza.com* | Office: 312-267-2929
>
> @saggezza_inc <https://twitter.com/saggezza_inc> | LinkedIn
> <https://www.linkedin.com/company/saggezza> | www.saggezza.com

Re: Input Forbidden Requirement

Posted by Ian Moran <ia...@saggezza.com>.

To toss in another instance where this change can act against valid
use-cases, consider any instance where you might want to use the Expression
Language to cause one of these 'source' processors to not only run on a
'trigger', but also run more dynamically. Previously, you could potentially
use ExtractText to set up Flowfile attributes you could then reference in
GetFile's Input Directory property, allowing you to specifically target
certain sub-directories for example. Without the ability to route
attribute-laden Flowfiles to GetFile, however, you're limited to only using
Environment variables with the expression language. Still useful in some
instances, but certainly not as robust.

I'm just thinking 'out loud' here, but instead of outright disallowing
incoming connections to processors that are output only, would it be
possible to give some visual indication to the user that the *content* of
the Flowfile will be ignored, and most likely overwritten, but still allow
them to utilize the connection to trigger the processor to run (and
possibly deliver attributes for the Expression Language's use)?

On Tue, Dec 22, 2015 at 11:53 AM, Ricky Saltzer <ri...@cloudera.com> wrote:

> Aha, that makes sense. Thanks for the explanation! I do agree that it could
> be useful to have control over when those types of processors execute.
>

-- 

*Ian Moran | **Associate Software Developer** | Saggezza*

Email: *Ian.Moran@Saggezza.com* | Office: 312-267-2929

@saggezza_inc <https://twitter.com/saggezza_inc> | LinkedIn
<https://www.linkedin.com/company/saggezza> | www.saggezza.com

Re: Input Forbidden Requirement

Posted by Ricky Saltzer <ri...@cloudera.com>.

Aha, that makes sense. Thanks for the explanation! I do agree that it could
be useful to have control over when those types of processors execute.

Re: Input Forbidden Requirement

Posted by Joe Witt <jo...@gmail.com>.

We should update the migration guide to describe this scenario.

I can do it in just a bit if nobody else has.
On Dec 22, 2015 11:51 AM, "Aldrin Piri" <al...@gmail.com> wrote:

> Ricky,
>
> This was only applied to processors where they, for lack of other words,
> had explicit requirements of input; either requiring or ignoring.  GetFile
> is one such example and the framework previously allowed users to connect
> processors to anything regardless of whether or not they actually consumed
> the input.  The canonical example that spurred this was that of ExecuteSQL
> which, at the time, did nothing unless it was provided with input.
> ExecuteProcess vs ExecuteStreamCommand was another source of confusion for
> a lot of new users.  The intent was to have more immediate notification to
> users.
>
> In the example you provide, the files provided by ExecuteProcess were never
> actually being evaluated/consumed by the GetFile process.  All that said, I
> could certainly see a case where GetFile makes sense to be driven by
> events, and would make for a nice improvement to that processor, thereby
> also requiring the InputRequirement to be modified.  However, in its
> current state, input has no bearing on the triggering/operation of GetFile
> and thus the move to mark such a connection as invalid.
>
> On Tue, Dec 22, 2015 at 11:28 AM, Ricky Saltzer <ri...@cloudera.com>
> wrote:
>
> > I noticed that in NiFi 0.4.0, a few processors (i.e. GetFile) are
> annotated
> > with @InputRequirement(Requirement.INPUT_FORBIDDEN), which is a little
> > confusing to me. What is the reasoning for forbidding an input connection
> > to processors like GetFile? There are situations where you want the
> ability
> > to trigger processors to execute, and an input connection is kind of the
> > only way (that I know of) to control precisely when it happens. I tested
> a
> > simple ExecuteProcess -> GetFile flow in 0.3.0, and then did an in place
> > upgrade to 0.4.0, which invalidated the flow. This change is pretty
> major,
> > and potentially breaks compatibility between upgrades.
> >
> > ricky
> >
>

Re: Input Forbidden Requirement

Posted by Aldrin Piri <al...@gmail.com>.

Ricky,

This was only applied to processors where they, for lack of other words,
had explicit requirements of input; either requiring or ignoring.  GetFile
is one such example and the framework previously allowed users to connect
processors to anything regardless of whether or not they actually consumed
the input.  The canonical example that spurred this was that of ExecuteSQL
which, at the time, did nothing unless it was provided with input.
ExecuteProcess vs ExecuteStreamCommand was another source of confusion for
a lot of new users.  The intent was to have more immediate notification to
users.

In the example you provide, the files provided by ExecuteProcess were never
actually being evaluated/consumed by the GetFile process.  All that said, I
could certainly see a case where GetFile makes sense to be driven by
events, and would make for a nice improvement to that processor, thereby
also requiring the InputRequirement to be modified.  However, in its
current state, input has no bearing on the triggering/operation of GetFile
and thus the move to mark such a connection as invalid.

On Tue, Dec 22, 2015 at 11:28 AM, Ricky Saltzer <ri...@cloudera.com> wrote:

> I noticed that in NiFi 0.4.0, a few processors (i.e. GetFile) are annotated
> with @InputRequirement(Requirement.INPUT_FORBIDDEN), which is a little
> confusing to me. What is the reasoning for forbidding an input connection
> to processors like GetFile? There are situations where you want the ability
> to trigger processors to execute, and an input connection is kind of the
> only way (that I know of) to control precisely when it happens. I tested a
> simple ExecuteProcess -> GetFile flow in 0.3.0, and then did an in place
> upgrade to 0.4.0, which invalidated the flow. This change is pretty major,
> and potentially breaks compatibility between upgrades.
>
> ricky
>