You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (JIRA)" <ji...@apache.org> on 2017/01/09 14:31:58 UTC

[jira] [Comment Edited] (NIFI-2881) Allow Database Fetch processors to accept incoming flow files and use Expression Language

    [ https://issues.apache.org/jira/browse/NIFI-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811850#comment-15811850 ] 

Matt Burgess edited comment on NIFI-2881 at 1/9/17 2:31 PM:
------------------------------------------------------------

IMHO there will be too many issues involved (state management, behavior with/without incoming flow files) with allowing incoming flow files when specifying max-value columns in the database fetch processors. Instead, I propose we use this Jira to allow incoming connections to GenerateTableFetch only, but add Expression Language (EL) support to both processors (see explanation below). The behavior could be as follows:

1) If there are no incoming connection(s), GenerateTableFetch will continue to work as-is. This allows for backwards compatibility and supports max-value columns as it always has.
2) If there are incoming connection(s) but no flow file(s) available, GenerateTableFetch will not perform any processing.
3) If there are incoming connection(s) and flow file(s) available, GenerateTableFetch will perform its normal processing, using the flow file and Expression Language evaluation while generating the query.

The reason for allowing Expression Language for both QueryDatabaseTable and GenerateTableFetch is due to the addition of support for the NiFi Variable Registry (https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry). This allows EL to be used with statically-provided values (vs values coming from flow file attributes), and can aid in support of the development lifecycle for NiFi flows (different set of variables for test vs production, e.g.).

The reason for not supporting incoming flow files for QueryDatabaseTable (while precluding the use of max-value columns) is that in this case its functionality becomes the same as ExecuteSQL with a SQL query specified in the ExecuteSQL properties. Having said that, QDT does have a couple of features that have not made it into ExecuteSQL yet, so if it is prudent to add the above behavior to QDT as well, then I'm ok with that. I was just apprehensive of touching the QDT code if at best (in theory) it results in equivalence with another processor.

With this added behavior, users will be able to use ListDatabaseTable with GenerateTableFetch in order to produce SQL statements for an arbitrary number of tables that are partitioned such that parallel fetches of appropriate size can be performed downstream, and the addition of EL support to both offers more flexibility as described above. Also it adds no complexity in terms of state management, as GenerateTableFetch would be invalid if the user has specified incoming connection(s) and max-value columns. This adds (not changes) behavior, so I feel with the appropriate documentation it would not be confusing to users.



was (Author: mattyb149):
IMHO there will be too many issues involved (state management, behavior with/without incoming flow files) when specifying max-value columns in the database fetch processors. Instead, I propose we use this Jira to allow incoming connections to GenerateTableFetch only, but add Expression Language (EL) support to both processors (see explanation below). The behavior could be as follows:

1) If there are no incoming connection(s), GenerateTableFetch will continue to work as-is. This allows for backwards compatibility and supports max-value columns as it always has.
2) If there are incoming connection(s) but no flow file(s) available, GenerateTableFetch will not perform any processing.
3) If there are incoming connection(s) and flow file(s) available, GenerateTableFetch will perform its normal processing, using the flow file and Expression Language evaluation while generating the query.

The reason for allowing Expression Language for both QueryDatabaseTable and GenerateTableFetch is due to the addition of support for the NiFi Variable Registry (https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry). This allows EL to be used with statically-provided values (vs values coming from flow file attributes), and can aid in support of the development lifecycle for NiFi flows (different set of variables for test vs production, e.g.).

The reason for not supporting incoming flow files for QueryDatabaseTable (while precluding the use of max-value columns) is that in this case its functionality becomes the same as ExecuteSQL with a SQL query specified in the ExecuteSQL properties. Having said that, QDT does have a couple of features that have not made it into ExecuteSQL yet, so if it is prudent to add the above behavior to QDT as well, then I'm ok with that. I was just apprehensive of touching the QDT code if at best (in theory) it results in equivalence with another processor.

With this added behavior, users will be able to use ListDatabaseTable with GenerateTableFetch in order to produce SQL statements for an arbitrary number of tables that are partitioned such that parallel fetches of appropriate size can be performed downstream, and the addition of EL support to both offers more flexibility as described above. Also it adds no complexity in terms of state management, as GenerateTableFetch would be invalid if the user has specified incoming connection(s) and max-value columns. This adds (not changes) behavior, so I feel with the appropriate documentation it would not be confusing to users.


> Allow Database Fetch processors to accept incoming flow files and use Expression Language
> -----------------------------------------------------------------------------------------
>
>                 Key: NIFI-2881
>                 URL: https://issues.apache.org/jira/browse/NIFI-2881
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>
> The QueryDatabaseTable and GenerateTableFetch processors do not allow Expression Language to be used in the properties, mainly because they also do not allow incoming connections. This means if the user desires to fetch from multiple tables, they currently need one instance of the processor for each table, and those table names must be hard-coded.
> To support the same capabilities for multiple tables and more flexible configuration via Expression Language, these processors should have properties that accept Expression Language, and should accept (optional) incoming connections.
> Conversation about the behavior of the processors is welcomed and encouraged. For example, if an incoming flow file is available, do we also still run the incremental fetch logic for tables that aren't specified by this flow file, or do we just do incremental fetching when the processor is scheduled but there is no incoming flow file. The latter implies a denial-of-service could take place, by flooding the processor with flow files and not letting it do its original job of querying the table, keeping track of maximum values, etc.
> This is likely a breaking change to the processors because of how state management is implemented. Currently since the table name is hard coded, only the column name comprises the key in the state. This would have to be extended to have a compound key that represents table name, max-value column name, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)