You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Avi Levi <av...@bluevoyant.com> on 2019/02/13 15:57:45 UTC

Production readiness

Hi
Looking at the production readiness
<https://ci.apache.org/projects/flink/flink-docs-stable/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly>
checklist - is there any rule of thumb to determine the maximum parallelism
? we have a stateful pipeline with high throughput (4k requests/sec)
running on google cloud (yarn) .
I understood that if we are not setting it the default setting is 128 but
it can change in the future but if we set it, it cannot be change later -
correct ?

Is there any way to get info on state (RocksDB) e.g number of keys , or
list of keys ?

Regards
Avi

Re: Production readiness

Posted by Andrey Zagrebin <an...@ververica.com>.
Hi Aitozi,

Flink will check upon job start and fail if
- max parallelism > parallelism
(KeyGroupRangeAssignment.computeKeyGroupRangeForOperatorIndex) or
- max parallelism of savepoint > max parallelism of restored job
(Checkpoints.loadAndValidateCheckpoint).

Theoretically that would be possible without migration and state loss but
with wasting increased resources which does not make sense.

Best,
Andrey

On Thu, Feb 14, 2019 at 3:50 AM aitozi <gj...@gmail.com> wrote:

> Hi, Andrey
>
> I have another question that if i do not set the maximum parallelism
> first(which be set to 128 by default), and then rescale to a parallelism
> bigger than 128. In this scenario,will the state lost?
>
> Thanks,
> Aitozi
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Production readiness

Posted by Till Rohrmann <tr...@apache.org>.
Hi Aitozi,

resuming a job with a higher parallelism than the initially defined max
parallelism (in your case 128) is not possible. For this one would need to
rewrite the savepoint information (basically rehash the keys) as Andrey
said.

Cheers,
Till

On Thu, Feb 14, 2019 at 3:50 AM aitozi <gj...@gmail.com> wrote:

> Hi, Andrey
>
> I have another question that if i do not set the maximum parallelism
> first(which be set to 128 by default), and then rescale to a parallelism
> bigger than 128. In this scenario,will the state lost?
>
> Thanks,
> Aitozi
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Production readiness

Posted by aitozi <gj...@gmail.com>.
Hi, Andrey

I have another question that if i do not set the maximum parallelism
first(which be set to 128 by default), and then rescale to a parallelism
bigger than 128. In this scenario,will the state lost?

Thanks,
Aitozi



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Production readiness

Posted by Andrey Zagrebin <an...@ververica.com>.
Hi Avi,

The maximum parallelism is not an easy parameter to change for a job, once
the job is started.
The checkpoints/savepoints of the job will need migration to rehash the
keyed state entries to the different number of key groups (unit of keyed
state storage). You can try Bravo tool for it [1].

As for the number of keys, you can try enabling RocksDB Flink metrics [2],
it is available since 1.7.

Best,
Andrey

[1] https://github.com/king/bravo
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#state-backend-rocksdb-metrics-estimate-num-keys

On Wed, Feb 13, 2019 at 4:58 PM Avi Levi <av...@bluevoyant.com> wrote:

> Hi
> Looking at the production readiness
> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly>
> checklist - is there any rule of thumb to determine the maximum parallelism
> ? we have a stateful pipeline with high throughput (4k requests/sec)
> running on google cloud (yarn) .
> I understood that if we are not setting it the default setting is 128 but
> it can change in the future but if we set it, it cannot be change later -
> correct ?
>
> Is there any way to get info on state (RocksDB) e.g number of keys , or
> list of keys ?
>
> Regards
> Avi
>