You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Andrei Shumanski <an...@shumanski.com> on 2018/05/16 14:30:58 UTC

Fwd: Decrease initial source read speed

Hi,

I am trying to use Flink for data ingestion.

Input is a Kafka topic with strings - paths to incoming archive files. The
job is unpacking the archives, reads data in them, parses and stores data
in another format.
Everything works fine if the topic is empty at the beginning of execution
and then archives income with regular intervals. But if the queue contains
several thousands of paths when the job starts -  the checkpont durations
become too long and write transactions either fail or take hours to
complete.

As Kafka messages are very small (just paths) - Kafka source manages to
read all of them almost instantly before backpressure is detected. And then
it tries to process all these entries within a single checkpoint. As
archives might be pretty large - it takes hours.

Do you know a solution for this problem? Is it possible to ask Flink source
to read data slowly before the correct processing speed is detected?

I decreased "fetch.max.bytes" kafka source property to 1kb and set buffer
timeout to 1ms. It seems to work for the current data set, but it does not
look like a good solution...

-- 
Best regards,
Andrei Shumanski

Re: Fwd: Decrease initial source read speed

Posted by makeyang <ri...@hotmail.com>.

Andrei Shumanski:
    which source are u using?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/