You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mark Payne (Jira)" <ji...@apache.org> on 2021/06/08 20:07:00 UTC

[jira] [Commented] (NIFI-8636) ConsumeKafka doesn't adhere the CRON driven and Timer Driven scheduling (except secs and mins) and starts randomly

    [ https://issues.apache.org/jira/browse/NIFI-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359548#comment-17359548 ] 

Mark Payne commented on NIFI-8636:
----------------------------------

The way that the Consume Kafka processor works, it calls the {{poll()}} method on a Kafka client and then waits some time for a response to be provided. Unfortunately the Kafka API does not expose a way to really determine whether or not the topic(s) are out of data. So all we can do is call poll() and then after a little bit decide that since we have nothing there must be nothing to do and return. In the meantime, the Kafka client buffers the data up on the client side so that the next time poll() is called, the data is immediately returned.

So if you're scheduling it to run every 5 hours, that means every 5 hours it will simply call poll() - and it may or may not get data immediately. If not, you'll have nothing output from the Processor. But when you set the Run Schedule to 0 seconds, you see data because it's constantly calling poll(). So it often returns with no data but then is quickly triggered again, which outputs data.

So I believe it is working as designed. But the Kafka API doesn't really lend itself well to only fetching the data once every 5 hours. It could perhaps be done by making the amount of time to wait after calling poll() configurable. But then you'd get what - a single message?

Generally the point of Kafka is to be a real-time streaming platform so I'm not sure it makes sense to only run once every 5 hours. Can you explain your use case further? Why would you want to poll only once every 5 hours?

> ConsumeKafka doesn't adhere the CRON driven and Timer Driven scheduling (except secs and mins) and starts randomly
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-8636
>                 URL: https://issues.apache.org/jira/browse/NIFI-8636
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.12.1, 1.13.2
>            Reporter: Kartik Mishra
>            Priority: Critical
>              Labels: consumekafka, nifi
>
> There is a requirement where we need to read the messages from a Kafka Topic and push to other systems. So far I was using *Timer Driven scheduling with Run Schedule of 0 sec, and everything worked fine.* Now when *I tried setting Run Schedule of 5 hour, it stopped working for some reason. Same issue with CRON Driven scheduling also.* It randomly runs and reads the messages from topic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)