You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/28 11:53:20 UTC

[jira] [Commented] (NIFI-2417) Implement Query and Scroll processors for ElasticSearch

    [ https://issues.apache.org/jira/browse/NIFI-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397446#comment-15397446 ] 

ASF GitHub Bot commented on NIFI-2417:
--------------------------------------

GitHub user gresockj opened a pull request:

    https://github.com/apache/nifi/pull/733

    NIFI-2417: Implementing QueryElasticsearchHttp and ScrollElasticsearchHttp

    I have implemented these processors for my own project, and thought it might be useful to submit them to NiFi.  They are based on FetchElasticsearchHttp, and have the following execution designs:
    
    - QueryElasticsearchHttp - submits an ES query and pages through the results in a single execution, emitting one flow file per document.  Allows both flow file input (in case the flow file has an attribute with the query to run) and non-input execution.
    - ScrollElasticsearchHttp - submits an ES query and uses the scroll API to scroll through the results.  The scroll_id for each respective page is kept in the state management for the processor, and each subsequent execution of the processor emits a single page of documents as a flow file.  We found this to be the most efficient way to scroll through a huge result set, as in the case of reindexing Elasticsearch, without losing our place if NiFi goes down.  The only quirky thing is that the processor state must be cleared before another query can be run, but this is documented in the processor, and jives with the use case of only being needed for rare events like a reindex.
    
    Since the processors already work correctly in our system, I am no longer authorized to put time into making major modifications to the code.  As a result, if any re-designs of this code is desired, I will be unable to put time toward it.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gresockj/nifi NIFI-2417

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #733
    
----
commit 5bbe09e2a7c4689bfa01588260ea89d2375e8356
Author: Joe Gresock <jo...@lmco.com>
Date:   2016-07-28T11:44:29Z

    NIFI-2417: Implementing QueryElasticsearchHttp and ScrollElasticsearchHttp

----


> Implement Query and Scroll processors for ElasticSearch
> -------------------------------------------------------
>
>                 Key: NIFI-2417
>                 URL: https://issues.apache.org/jira/browse/NIFI-2417
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Joseph Gresock
>            Assignee: Joseph Gresock
>            Priority: Minor
>
> FetchElasticsearchHttp allows users to select a single document from Elasticsearch in NiFi, but there is no way to run a query to retrieve multiple documents.
> We should add a QueryElasticsearchHttp processor for running a query and returning a flow file per result, for small result sets.  This should allow both input and non-input execution.  
> A separate ScrollElasticsearchHttp processor would also be useful for scrolling through a huge result set.  This should use the state manager to maintain the scroll_id value, and use this as input to the next scroll page.  As a result, this processor should not allow flow file input, but should retrieve one page per run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)