You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Tim Dean <ti...@gmail.com> on 2019/10/31 21:04:47 UTC

Supporting Elasticsearch scrolling with an input flow file

Hello,

I would like to use the existing ScrollElasticsearchHttp to perform a search that returns a potentially large number of hits. The parameters of the search need to reference one or more flow file attributes.

Looking at the source code for this processor it appears that the QUERY property supports EL with flow file attributes. Furthermore, the documentation for the FAILURE relationship notes that only incoming flow files will be routed to failure. So it seems clear that this processor was designed to allow input flow files. Unfortunately though, the processor also has been annotated with INPUT_FORBIDDEN so I can’t use as is.

I assume that there is a good reason for forbidding input here. Before I go and try to implement a custom processor that does what I want, I’d like to know if some hidden problem awaits me.

Can someone clarify why this processor forbids input, and what problems I might expect if I try to circumvent this limitation?

Thanks

- Tim

Sent from my iPhone

Re: Supporting Elasticsearch scrolling with an input flow file

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Tim,

Sorry for the late reply.
It seems the ScrollElasticsearchHttp processor is designed to run a
one-shot query to import query results from Elasticsearch.
The description says "The state must be cleared before another query
can be run."
It tracks progress using managed state, not via incoming FlowFiles.
This processor is a source-processor similar to processors such as ListFile.

If large amount of documents need to be ingested with pagination by a
query based on the incoming FlowFile attribute, then I'd enhance
QueryElasticsearchHttp processor, so that it can route original
incoming FlowFile to a new relationship such as 'next page' while
incrementing the page number attribute, so that next time the FlowFile
is passed to the same QueryElasticsearchHttp again, the query results
in the next page will be used to populate FlowFiles into 'success'
relationship.
QueryElasticsearchHttp processor currently simply removes incoming FlowFiles.

Thanks,
Koji

On Fri, Nov 1, 2019 at 5:05 AM Tim Dean <ti...@gmail.com> wrote:
>
> Hello,
>
> I would like to use the existing ScrollElasticsearchHttp to perform a search that returns a potentially large number of hits. The parameters of the search need to reference one or more flow file attributes.
>
> Looking at the source code for this processor it appears that the QUERY property supports EL with flow file attributes. Furthermore, the documentation for the FAILURE relationship notes that only incoming flow files will be routed to failure. So it seems clear that this processor was designed to allow input flow files. Unfortunately though, the processor also has been annotated with INPUT_FORBIDDEN so I can’t use as is.
>
> I assume that there is a good reason for forbidding input here. Before I go and try to implement a custom processor that does what I want, I’d like to know if some hidden problem awaits me.
>
> Can someone clarify why this processor forbids input, and what problems I might expect if I try to circumvent this limitation?
>
> Thanks
>
> - Tim
>
> Sent from my iPhone