You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by jlist9 <jl...@gmail.com> on 2018/11/16 03:26:43 UTC

Flink Mini Cluster Performance?

I'm developing a Flink application and I'm still learning. For simplicity,
most of the time I test by running the main method of the entry class as a
regular Java application. Would that be running on what's called a mini
cluster? I find it quote convenient and makes debugging job really easy. My
question is, if it's a job that's small enough and can potentially be
executed on a single machine, is there going to be a performance penalty to
do it this way verses starting a Flink instance in local mode, or a full
fledged Flink cluster? For jobs with low workload, is there any down side
just to run it like a regular Java application?

A side question is, when running it with the mini cluster, I'd watch the
log messages. I find that the process seems to focus on one operator for a
while, then switch to another operator. For example, my simple Flink
application has a kafka source, a windowed aggregator, and an elasticsearch
sink. I'd see a lot of SourceFunction log messages pumping records into the
pipeline, then they (the logs) would stop for a while. I then see some
AggregateFunction logs as the records come in, and then SinkFunction logs
after that when the window is up. After that, this could be a pause of 5-10
seconds or longer, SourceFunction logs would show up again. Because of the
windowing operation, I expect the SinkFunction to fire once in a while but
I was expecting to see interleaving SourceFunction and AggregateFunction
logs showing all operators are being run at the same time, instead of logs
from the loop inside SourceFunction.run(), followed by logs from
AggregateFunction.add() method. Is this because I'm running it with the
mini cluster, or is this how things are expected to work?

Thanks in advance
Jack