You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Govindarajan Srinivasaraghavan <go...@gmail.com> on 2016/12/23 19:11:52 UTC

Dynamic Scaling

Hi,

We have a computation heavy streaming flink job which will be processing
around 100 million message at peak time and around 1 million messages in
non peak time. We need the capability to dynamically scale so that the
computation operator can scale up and down during high or low work loads
respectively without restarting the job in order to lower the machine costs.

Is there an ETA on when the rescaling a single operator without restart
feature will be released?

Is it possible to auto scale one of the operators with docker swarm or
Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If
so can I get some documentation or steps to achieve this behaviour.

Also is there any document on what are the tasks of a job manager apart
from scheduling and reporting status?

Since there is just one job manager we just wanted to check if there would
be any potential scaling limitations as the processing capacity increases.

Thanks
Govind

Re: Dynamic Scaling

Posted by Jamie Grier <ja...@data-artisans.com>.
Hi Govind,

In Flink 1.2 (feature complete, undergoing test) you will be able to scale
your jobs/operators up and down at will, however you'll have to build a
little tooling around it yourself and scale based on your own metrics.  You
should be able to integrate this with Docker Swarm or Amazon auto-scaling
groups but I haven't done it myself yet.

The way this will really work is the following sequence:

1) Detect that you want to scale up or down (this part is up to you)
2) Flink Job cancel with Savepoint -- this will shut down with a savepoint
in such a way that no messages will need to be re-processed.
3) Launch Flink job from savepoint with different parallelism.

That's it.  You should be able to script this such that the whole process
takes just a couple of seconds.  What I don't have for you right now is any
sort of DIRECT integration with Docker or Amazon for scaling.  You have to
trigger this procedure yourself based on metrics, etc.   Do you think this
will work for your use case?

On Fri, Dec 23, 2016 at 11:11 AM, Govindarajan Srinivasaraghavan <
govindraghvan@gmail.com> wrote:

> Hi,
>
> We have a computation heavy streaming flink job which will be processing
> around 100 million message at peak time and around 1 million messages in
> non peak time. We need the capability to dynamically scale so that the
> computation operator can scale up and down during high or low work loads
> respectively without restarting the job in order to lower the machine
> costs.
>
> Is there an ETA on when the rescaling a single operator without restart
> feature will be released?
>
> Is it possible to auto scale one of the operators with docker swarm or
> Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If
> so can I get some documentation or steps to achieve this behaviour.
>
> Also is there any document on what are the tasks of a job manager apart
> from scheduling and reporting status?
>
> Since there is just one job manager we just wanted to check if there would
> be any potential scaling limitations as the processing capacity increases.
>
> Thanks
> Govind
>



-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
jamie@data-artisans.com

Re: Dynamic Scaling

Posted by Govindarajan Srinivasaraghavan <go...@gmail.com>.
Hi All,

It would great if someone can help me with my questions. Appreciate all the help.

Thanks.

> On Dec 23, 2016, at 12:11 PM, Govindarajan Srinivasaraghavan <go...@gmail.com> wrote:
> 
> Hi,
> 
> We have a computation heavy streaming flink job which will be processing around 100 million message at peak time and around 1 million messages in non peak time. We need the capability to dynamically scale so that the computation operator can scale up and down during high or low work loads respectively without restarting the job in order to lower the machine costs.
> 
> Is there an ETA on when the rescaling a single operator without restart feature will be released?
> 
> Is it possible to auto scale one of the operators with docker swarm or Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If so can I get some documentation or steps to achieve this behaviour.
> 
> Also is there any document on what are the tasks of a job manager apart from scheduling and reporting status? 
> 
> Since there is just one job manager we just wanted to check if there would be any potential scaling limitations as the processing capacity increases. 
> 
> Thanks
> Govind
> 

Re: Dynamic Scaling

Posted by Jamie Grier <ja...@data-artisans.com>.
Hi Govind,

In Flink 1.2 (feature complete, undergoing test) you will be able to scale
your jobs/operators up and down at will, however you'll have to build a
little tooling around it yourself and scale based on your own metrics.  You
should be able to integrate this with Docker Swarm or Amazon auto-scaling
groups but I haven't done it myself yet.

The way this will really work is the following sequence:

1) Detect that you want to scale up or down (this part is up to you)
2) Flink Job cancel with Savepoint -- this will shut down with a savepoint
in such a way that no messages will need to be re-processed.
3) Launch Flink job from savepoint with different parallelism.

That's it.  You should be able to script this such that the whole process
takes just a couple of seconds.  What I don't have for you right now is any
sort of DIRECT integration with Docker or Amazon for scaling.  You have to
trigger this procedure yourself based on metrics, etc.   Do you think this
will work for your use case?

On Fri, Dec 23, 2016 at 11:11 AM, Govindarajan Srinivasaraghavan <
govindraghvan@gmail.com> wrote:

> Hi,
>
> We have a computation heavy streaming flink job which will be processing
> around 100 million message at peak time and around 1 million messages in
> non peak time. We need the capability to dynamically scale so that the
> computation operator can scale up and down during high or low work loads
> respectively without restarting the job in order to lower the machine
> costs.
>
> Is there an ETA on when the rescaling a single operator without restart
> feature will be released?
>
> Is it possible to auto scale one of the operators with docker swarm or
> Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If
> so can I get some documentation or steps to achieve this behaviour.
>
> Also is there any document on what are the tasks of a job manager apart
> from scheduling and reporting status?
>
> Since there is just one job manager we just wanted to check if there would
> be any potential scaling limitations as the processing capacity increases.
>
> Thanks
> Govind
>



-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
jamie@data-artisans.com

Re: Dynamic Scaling

Posted by Govindarajan Srinivasaraghavan <go...@gmail.com>.
Hi All,

It would great if someone can help me with my questions. Appreciate all the help.

Thanks.

> On Dec 23, 2016, at 12:11 PM, Govindarajan Srinivasaraghavan <go...@gmail.com> wrote:
> 
> Hi,
> 
> We have a computation heavy streaming flink job which will be processing around 100 million message at peak time and around 1 million messages in non peak time. We need the capability to dynamically scale so that the computation operator can scale up and down during high or low work loads respectively without restarting the job in order to lower the machine costs.
> 
> Is there an ETA on when the rescaling a single operator without restart feature will be released?
> 
> Is it possible to auto scale one of the operators with docker swarm or Amazon ECS auto scaling based on kafka consumer lag or cpu consumption? If so can I get some documentation or steps to achieve this behaviour.
> 
> Also is there any document on what are the tasks of a job manager apart from scheduling and reporting status? 
> 
> Since there is just one job manager we just wanted to check if there would be any potential scaling limitations as the processing capacity increases. 
> 
> Thanks
> Govind
>