You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Barry Higgins <ba...@gmail.com> on 2021/09/02 11:06:23 UTC

Deploying Stateful functions with an existing Ververica cluster

Hi,

I have set up a remote stateful function in python which I’ve deployed
on an AWS EC2 box. I am interacting with this from a separate statefun
docker container running 2 flink-statefun images with roles master and
worker (on a separate EC2 instance). The ingress and egress points for
this function are Kafka.

I then have a separate Java application using Flink, deployed on a
Ververica cluster. From this application I am communicating with the
statefun function by adding a sink/source pointing at the
ingress/egress above.

I have a couple of questions on this setup.

I am unsure if there is a better way to communicate with the function
from the Flink application
I am wondering if there is anyway that I can use the existing deployed
application to maintain the state of my remote function, meaning that
I can discard the statefun master/worker elements?
Failing that, do I just need to create a new Flink application,
translate the equivalent of the module.yml that is passed to the
existing master/worker to Java, add the dependencies and deploy that
jar?

I hope that makes sense?
Kindest Regards,

Barry

Re: Deploying Stateful functions with an existing Ververica cluster

Posted by Barry Higgins <ba...@gmail.com>.

Hi Igal,
Thank you for getting back so quickly.
All of our applications are currently deployed onto the one Ververica cluster so I would be quite keen to get the DataSteam integration option evaluated (I am currently hitting an exception where the ObjectMapper in DefaultHttpRequestReplyClientSpec is not supporting Java 8 java.time.Duration). While I muddle through that, I would be obliged if you could direct me as to how I can deploy the equivalent of the master/worker container on Ververica.
Would it be as easy as creating a new Flink application in Java and porting the module.yml configurations with the relevant dependencies into that, the deploying that jar?
This is a nice middle ground option where the statefun state could be managed outside of the calling application whilst offering the separation you referred to on the same cluster.
I am thinking that that same statefun flink master/worker could be used to route all traffic in the future assuming the load was tolerable but that is further down the line.
Thanks again, I really appreciate your insights.
Barry

On 2021/09/02 13:09:13, Igal Shilman <ig...@apache.org> wrote: 
> Hi Barry,
> I've forward your email to the user mailing list as it is more suitable
> here :-)
> 
> Your question definitely makes sense, and let me try to provide you with
> some pointers:
> 
> 1. The architecture that you've outlined has many advantages and is
> desirable if you can
> afford that. Some of them are
> - clean separation of concerns
> - better resource isolation.
> - different SLO and fault domains (failure/slowness in your Python
> function, doesn't trigger a failure/back-pressure in your ETL)
> - you can use event time watermarks for your ETL (statefun only works with
> processing time)
> 
> 2. If you would still prefer to merge the two then you can checkout the
> DataStream integration API [1]
> Although it has some rough edges in respect with working with remote
> functions in particular.
> 
> Good luck,
> Igal.
> 
> 
> [1]
> https://nightlies.apache.org/flink/flink-statefun-docs-release-3.1/docs/sdk/flink-datastream/
> 
> 
> On Thu, Sep 2, 2021 at 1:07 PM Barry Higgins <ba...@gmail.com>
> wrote:
> 
> > Hi,
> >
> > I have set up a remote stateful function in python which I’ve deployed
> > on an AWS EC2 box. I am interacting with this from a separate statefun
> > docker container running 2 flink-statefun images with roles master and
> > worker (on a separate EC2 instance). The ingress and egress points for
> > this function are Kafka.
> >
> > I then have a separate Java application using Flink, deployed on a
> > Ververica cluster. From this application I am communicating with the
> > statefun function by adding a sink/source pointing at the
> > ingress/egress above.
> >
> > I have a couple of questions on this setup.
> >
> > I am unsure if there is a better way to communicate with the function
> > from the Flink application
> > I am wondering if there is anyway that I can use the existing deployed
> > application to maintain the state of my remote function, meaning that
> > I can discard the statefun master/worker elements?
> > Failing that, do I just need to create a new Flink application,
> > translate the equivalent of the module.yml that is passed to the
> > existing master/worker to Java, add the dependencies and deploy that
> > jar?
> >
> > I hope that makes sense?
> > Kindest Regards,
> >
> > Barry
> >
>

Re: Deploying Stateful functions with an existing Ververica cluster

Posted by Igal Shilman <ig...@apache.org>.

Hi Barry,
I've forward your email to the user mailing list as it is more suitable
here :-)

Your question definitely makes sense, and let me try to provide you with
some pointers:

1. The architecture that you've outlined has many advantages and is
desirable if you can
afford that. Some of them are
- clean separation of concerns
- better resource isolation.
- different SLO and fault domains (failure/slowness in your Python
function, doesn't trigger a failure/back-pressure in your ETL)
- you can use event time watermarks for your ETL (statefun only works with
processing time)

2. If you would still prefer to merge the two then you can checkout the
DataStream integration API [1]
Although it has some rough edges in respect with working with remote
functions in particular.

Good luck,
Igal.


[1]
https://nightlies.apache.org/flink/flink-statefun-docs-release-3.1/docs/sdk/flink-datastream/


On Thu, Sep 2, 2021 at 1:07 PM Barry Higgins <ba...@gmail.com>
wrote:

> Hi,
>
> I have set up a remote stateful function in python which I’ve deployed
> on an AWS EC2 box. I am interacting with this from a separate statefun
> docker container running 2 flink-statefun images with roles master and
> worker (on a separate EC2 instance). The ingress and egress points for
> this function are Kafka.
>
> I then have a separate Java application using Flink, deployed on a
> Ververica cluster. From this application I am communicating with the
> statefun function by adding a sink/source pointing at the
> ingress/egress above.
>
> I have a couple of questions on this setup.
>
> I am unsure if there is a better way to communicate with the function
> from the Flink application
> I am wondering if there is anyway that I can use the existing deployed
> application to maintain the state of my remote function, meaning that
> I can discard the statefun master/worker elements?
> Failing that, do I just need to create a new Flink application,
> translate the equivalent of the module.yml that is passed to the
> existing master/worker to Java, add the dependencies and deploy that
> jar?
>
> I hope that makes sense?
> Kindest Regards,
>
> Barry
>