You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gigio Topo <ma...@yahoo.com> on 2021/12/08 11:32:35 UTC

Stateful functions - egress question

Hi,

I successfully created a Stateful function F that transforms incoming objects and writes them on a relational database. Function F is deployed as remote module. Everything looks fine.

Now, I want to split responsibilities by introducing an custom Egress E for the database while I refactor F to contain only the business logic.

It's not clear to me if the only way to deploy a custom egress is to use and embedded module because documentation states:

"Embedded modules allow users to load code into the Stateful Functions runtime that is executed directly within the cluster. This is usually to allow plugging in custom ingress and egress implementations. Additionally, and embedded module may include embedded functionsthat run within the cluster."

on the other side, the same documentation states
"Embedded modules should be used with care, they cannot be deployed or scaled without downtime and can effect the performance and stability of the entire cluster."

I want to stay away from the embedded way for the reasons stated in the documentation, so I have the following questions:

Is it possible to create and use a remote module egress?
If yes, are there any indications to be taken into account?
If no, what do you suggest to implement instead? 

Thanks for any feedback
M.

Re: Stateful functions - egress question

Posted by Gigio Topo <ma...@yahoo.com>.
Hi,

Following your indications, it makes totally sense to use Kafka as egress and add a stateless bulk importer on top of that.

Thanks for the clarification.

M.


> Unfortunately we don't support remote egress right now. If you really want to avoid embed modules,
> and you are using Kafka/Kinesis for ingress, then perhaps you can use Kafka/Kinesis for egress as-well,
> then write a simple almost stateless bulk importer that takes bulk insert commands out of Kafka (or Kinesis) and
> bulk inserts them to your database.








Re: Stateful functions - egress question

Posted by Igal Shilman <ig...@apache.org>.
Hello,

Glad to hear that you've successfully deployed a remote function with
StateFun :-)

It's not clear to me if the only way to deploy a custom egress is to use
> and embedded module because documentation states:


Indeed currently the only way to define a custom egress is by writing an
embedded module that gets bundled with Flink.
You can look here for example [1] or a more advanced example here [2][3]

on the other side, the same documentation states
> "Embedded modules should be used with care, they cannot be deployed or
> scaled without downtime and can effect the performance and stability of the
> entire cluster."


I think that we should perhaps rephrase it a bit, but what we mean is that
since your code is running within the same JVM that runs Flink, and making
changes to that code, requires a restart to take effect. In addition, any
memory leaks, or long blocking calls might affect the overall performance
and stability, and general caution is required.
But this is completely normal! and the only way of introducing new egress
types.

Is it possible to create and use a remote module egress?
> If yes, are there any indications to be taken into account?


Unfortunately we don't support remote egress right now. If you really want
to avoid embed modules,
and you are using Kafka/Kinesis for ingress, then perhaps you can use
Kafka/Kinesis for egress as-well,
then write a simple almost stateless bulk importer that takes bulk insert
commands out of Kafka (or Kinesis) and
bulk inserts them to your database.

Feel free to browse through the examples I've linked below, and ask any
followup questions here.
Also, if you are developing this egress publicly and would like me to take
a look and provide feedback,
I'd be happy to do that.

p.s,
You might find Flink's JDBC sink useful [4]

Good luck,
Igal.

[1]
https://github.com/apache/flink-statefun/blob/master/statefun-e2e-tests/statefun-smoke-e2e-driver/src/main/java/org/apache/flink/statefun/e2e/smoke/driver/DriverModule.java#L55,L62
[2]
https://github.com/apache/flink-statefun/blob/master/statefun-flink/statefun-flink-io-bundle/src/main/java/org/apache/flink/statefun/flink/io/kafka/KafkaFlinkIoModule.java#L31
[3]
https://github.com/apache/flink-statefun/blob/b4ba9547b8f0105a28544fd28a5e0433666e9023/statefun-flink/statefun-flink-io-bundle/src/main/java/org/apache/flink/statefun/flink/io/kafka/KafkaSinkProvider.java#L39,L59
[4]
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/jdbc/




On Wed, Dec 8, 2021 at 12:33 PM Gigio Topo <ma...@yahoo.com>
wrote:

> Hi,
>
> I successfully created a Stateful function F that transforms incoming
> objects and writes them on a relational database. Function F is deployed as
> remote module. Everything looks fine.
>
> Now, I want to split responsibilities by introducing an custom Egress E
> for the database while I refactor F to contain only the business logic.
>
> It's not clear to me if the only way to deploy a custom egress is to use
> and embedded module because documentation states:
>
> "Embedded modules allow users to load code into the Stateful Functions
> runtime that is executed directly within the cluster. This is usually to
> allow plugging in custom ingress and egress implementations. Additionally,
> and embedded module may include embedded functionsthat run within the
> cluster."
>
> on the other side, the same documentation states
> "Embedded modules should be used with care, they cannot be deployed or
> scaled without downtime and can effect the performance and stability of the
> entire cluster."
>
> I want to stay away from the embedded way for the reasons stated in the
> documentation, so I have the following questions:
>
> Is it possible to create and use a remote module egress?
> If yes, are there any indications to be taken into account?
> If no, what do you suggest to implement instead?
>
> Thanks for any feedback
> M.
>