You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mahieddine Cherif (Jira)" <ji...@apache.org> on 2020/07/31 15:04:00 UTC
[jira] [Updated] (NIFI-7696) MultiQueryRecord Processor

     [ https://issues.apache.org/jira/browse/NIFI-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahieddine Cherif updated NIFI-7696:
------------------------------------
    Summary: MultiQueryRecord Processor  (was: MultiQueryRecord)

> MultiQueryRecord Processor
> --------------------------
>
>                 Key: NIFI-7696
>                 URL: https://issues.apache.org/jira/browse/NIFI-7696
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Extensions
>            Reporter: Mahieddine Cherif
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *Context :*
> QueryRecord is such a nice processor, it helps everyone to perform all kind of advanced queries on a wide range of data (CSV, JSON, ...thanks to the RecordAPI) increasing in its way NiFi ETL capacity by a big order of magnitude. 
>  I want to take that and push NiFi even further by giving it the possibility to do the same thing even on +multiple FlowFiles as input+ making something like performing a join like query on multiple FlowFile a reality. 
>  
> *Proposal:*
> Create a new processor called "MultiQueryRecord" which can be thought of technically as a being a child of QueryRecord and a MergeRecord processor, this processor will be able to take different FlowFiles from different sources, wait that all of the necessary FlowFiles is expecting are here before triggering and executing all the SQL queries provided as dynamic properties. 
>  
>  * Every FlowFile will have an attribute which contains the name of the "virtual table" that will be used in the SQL query. 
>  * The user configures how many FlowFiles is expecting also the attribute name which is going to contain the table name and of course the correlation attribute name to differentiate FlowFiles issued from different runs. 
>  * The user also defines of course all his SQL queries in the dynamic properties (same as we do now for the QueryRecord processor.
>  
> The processor will use the same MergeBin concept as in the MergeRecord processor to handle the pending FlowFiles while waiting for all of them to arrive before executing all the defined SQL queries.
>  
> *Implementation:*
> I've already implemented this processor and would like to contribute to this wonderful project, i'm about to finish all the unit tests and stuff and will update this issue with my PR if you are interested by.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)