You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Martijn de Heus <mj...@hotmail.com> on 2021/02/04 20:51:25 UTC

StateFun scalability

Hi all,

I’ve been working with StateFun for a bit for my university project. I am now trying to increase the number of StateFun workers and the parallelism, however this barely seems to increase the throughput of my system.

I have 5000 function instances in my system during my tests. Once I increase the workers from 1 to 3 I notice a significant increase in throughput, however from 3 to 5 (or even to 7) I notice no increase. I run all workers with 4 CPUs and made sure that Kafka and my deployed colocated functions are not causing any bottlenecks. I also have many partitions for the ingress topics.

I attached my flink-conf.yaml below. Is this expected behaviour for StateFun or am I missing some configuration which can improve my performance. Also if this is expected for StateFun, what could be causing this?

Best regards,

Martijn


jobmanager.rpc.address: statefun-master
taskmanager.numberOfTaskSlots: 1
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
classloader.parent-first-patterns.additional: org.apache.flink.statefun;org.apache.kafka;com.google.protobuf
state.checkpoints.dir: file:///checkpoint-dir
state.backend: rocksdb
state.backend.rocksdb.timer-service.factory: ROCKSDB
state.backend.incremental: true
execution.checkpointing.interval: 10sec
execution.checkpointing.mode: EXACTLY_ONCE
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 2147483647
restart-strategy.fixed-delay.delay: 1sec
jobmanager.memory.process.size: 1g
taskmanager.memory.process.size: 1g
parallelism.default: 5

Re: StateFun scalability

Posted by Igal Shilman <ig...@ververica.com>.

Hello Martijn,

Great to hear that you are exploring StateFun as part of your university
project!

Can you please clarify:
- how do you measure throughput?
- by co-located functions, do you mean a remote function on the same
machine?
- Can you share a little bit more about your functions, what are they doing?
- Do you use any kind of state?
- What kind of messages do you send? are you using Protobuf for messages or
something else?

Can you validate your setup vs a vanilla Flink program (something like a
wordcount)

Thanks,
Igal


On Thu, Feb 4, 2021 at 9:51 PM Martijn de Heus <mj...@hotmail.com> wrote:

> Hi all,
>
> I’ve been working with StateFun for a bit for my university project. I am
> now trying to increase the number of StateFun workers and the parallelism,
> however this barely seems to increase the throughput of my system.
>
> I have 5000 function instances in my system during my tests. Once I
> increase the workers from 1 to 3 I notice a significant increase in
> throughput, however from 3 to 5 (or even to 7) I notice no increase. I run
> all workers with 4 CPUs and made sure that Kafka and my deployed colocated
> functions are not causing any bottlenecks. I also have many partitions for
> the ingress topics.
>
> I attached my flink-conf.yaml below. Is this expected behaviour for
> StateFun or am I missing some configuration which can improve my
> performance. Also if this is expected for StateFun, what could be causing
> this?
>
> Best regards,
>
> Martijn
>
>
> jobmanager.rpc.address: statefun-master
> taskmanager.numberOfTaskSlots: 1
> blob.server.port: 6124
> jobmanager.rpc.port: 6123
> taskmanager.rpc.port: 6122
> classloader.parent-first-patterns.additional:
> org.apache.flink.statefun;org.apache.kafka;com.google.protobuf
> state.checkpoints.dir: file:///checkpoint-dir
> state.backend: rocksdb
> state.backend.rocksdb.timer-service.factory: ROCKSDB
> state.backend.incremental: true
> execution.checkpointing.interval: 10sec
> execution.checkpointing.mode: EXACTLY_ONCE
> restart-strategy: fixed-delay
> restart-strategy.fixed-delay.attempts: 2147483647
> restart-strategy.fixed-delay.delay: 1sec
> jobmanager.memory.process.size: 1g
> taskmanager.memory.process.size: 1g
> parallelism.default: 5
>