You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Leng Lim <le...@brightsparklabs.com> on 2017/04/13 01:16:21 UTC

PutElasticsearch5 Processor Finished Processing

Hello,

Is there any easy way to tell when the PutElasticsearch5 processor has
finished inserting or updating records in the database? What I'm looking for
is some way for the processor to signal that it has finished processing all
the insert or updates so I know the database is in the correct state before
I query or process further inserts/updates on it.

From what I can see, the PutElasticsearch5 will only output a flow file for
each insert/update or failure it processes. It would be helpful if there
were some way to specify how many records are expected to inserted/updated
for a specific transaction and have the processor signal via routing a flow
file to some relationship or some other mechanism that it has finished
processing the expected amount of inserts/updates.

Regards,

Leng



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/PutElasticsearch5-Processor-Finished-Processing-tp15457.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: PutElasticsearch5 Processor Finished Processing

Posted by Leng Lim <le...@brightsparklabs.com>.
Thank Matt,

I understand what you mean about NiFi being "flow" oriented and that is what
the PutElasticsearch5 processor has been designed for. I agree with you that
what I'm trying to do is batch oriented. I'm trying to avoid creating a
custom processor for what I'm trying to do so I'll explain my flow in a bit
more detail to see if what I want is possible in NiFi without custom
processors.

My flow basically processes sets of files containing inserts and updates
against an Elasticsearch database. There may be many sets of files that need
to be processed at a time, however each set is dependent on the previous set
i.e. they need to be processed in the correct order. If the data flow tries
to process updates for a set of files when the inserts of the previous set
has not completed processing, the updates will fail because the documents
will not yet exist in the database.

The upstream processor is a custom processor which monitors a folder for new
file sets to be processed. Each file set exists in its own folder and the
custom processor also performs validation checks etc. So really what I want
is for this custom processor to only send through file set at a time then
only send through the next file set after the downstream processors complete
their processing. In this case it would be once the PutElasticsearch5
complete all their inserts/updates. I know that the Wait and Notify
processors (which will be available in NiFi 1.2) could be used for this but
I would still need some way for the PutElasticsearch5 to tell me when it has
completed it's insert and updates for the file set.

Using the MonitorActivity processor to generate an event if an amount of
time passed with no output from the PutElasticsearch5 processor could work
but the amount of time I would need to set seems somewhat arbitrary and
could perhaps cause some false positives.

Thanks for your help.

Regards,

Leng





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/PutElasticsearch5-Processor-Finished-Processing-tp15457p15459.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: PutElasticsearch5 Processor Finished Processing

Posted by Matt Burgess <ma...@apache.org>.
Leng,

Generally, Apache NiFi is data "flow" oriented, in the same sense of a
water flow (along pipes, for example) that might emit a lot or a
little (or none) at any one time, but overall the water continues to
flow through, and there might not be a discrete concept of "finished".

In your case, when would you know that a processor is "finished
processing all the inserts/updates"?  Do you have an upstream
processor that runs at a specified schedule, such that the downstream
flow would only be processed every so often?  If so, you could try the
MonitorActivity processor [1] after your PutES5 processor, it would
allow you to generate an event if an amount of time has passed with no
output. If not, can you describe your NiFi flow in more detail?
Although this seems more batch-oriented, NiFi does have some
capabilities that could enable you to achieve what you're looking for.

Regards,
Matt

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MonitorActivity/index.html


On Wed, Apr 12, 2017 at 9:16 PM, Leng Lim <le...@brightsparklabs.com> wrote:
> Hello,
>
> Is there any easy way to tell when the PutElasticsearch5 processor has
> finished inserting or updating records in the database? What I'm looking for
> is some way for the processor to signal that it has finished processing all
> the insert or updates so I know the database is in the correct state before
> I query or process further inserts/updates on it.
>
> From what I can see, the PutElasticsearch5 will only output a flow file for
> each insert/update or failure it processes. It would be helpful if there
> were some way to specify how many records are expected to inserted/updated
> for a specific transaction and have the processor signal via routing a flow
> file to some relationship or some other mechanism that it has finished
> processing the expected amount of inserts/updates.
>
> Regards,
>
> Leng
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/PutElasticsearch5-Processor-Finished-Processing-tp15457.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.