You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Shunya One <ze...@gmail.com> on 2023/02/06 06:25:32 UTC

Fwd: Routing data flow with Flink statefun

Hi,

I'd like to optimize and streamline my stream processing use cases. I like
the Flink statefun design which provides compute-state isolation which
makes it easy to scale compute-state efficiently and independently
leveraging Flink's at-least-once guarantee. I'm evaluating if we can use
statefun to provide scalable resource isolation for event routing jobs, and
ensued transformations per route job.

I went through the statefun documentation
<https://nightlies.apache.org/flink/flink-statefun-docs-master/>, and would
love to hear communities opinion on using Flink stateful functions as a
content-based-router for a high throughput Kinesis stream, specifically:

   1. Can we dynamically register and call remote functions ?
      1. Option 1: Update the state with HTTP function endpoint
      <https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/modules/http-endpoint/#url-template>
externally
      and read it from a function to discover and send the same message to
      downstream functions dynamically
      2. Option 2: Publish broadcast state as a low volume slowly updated
      broadcast stream
   2. Since the event extraction will transform a 10-100KB message to
   1-5MB, can I checkpoint and save the state before enrichment only.
   1. Option 1: Checkpoint state every X mins, i.e. 2.5TB-per-min
      (40GBps*60) which makes it costly to provider higher state retention
      guarantees
      2. Option 2: Store state pre-event-extraction only i.e. only
      notifications, and recompute event extraction in case of failures
   3. Does Flink statefun cluster store each input to remote function in
   the state to recover from failures?
      1. Option 1: If yes, how to make sure the state backend is scaled up
      properly to handle configured retentions? Naively, does this
mean the state
      backend should be scaled up to handle max (throughput of all functions) *
      number of functions.
      2. Option 2: If no, how does the statefun cluster provide failure
      tolerance if a function is failing for say 4hrs?
   4. Is  statefun recommended for content-based-routing for efficient
   route job isolations?

High level flow with statefun:
>     Kinesis => { KDA Flink statefun cluster } => { enrich } => { broadcast
> } =>  { each route { fiter => sink } }
>


where:
>


{ KDA Flink statefun cluster } = KDA embedded cluster with aws managed
> rocksdb state backend.
> { broadcast }  = AWS Lambda function that reads the route table from Flink
> state and call function endpoint per route.
> { route } = AWS Lambda function that filters the input message and applies
> predefined transformations using JSONPath DSL.
>
> Volume of data flowing through the system: Kinesis input source = 4000
> shards ~ 4GB of notifications per second, therefore,
> 4GBps input => 40GBps post event extraction => 25GBps post filter =>
> 20GBps post transformation.
>


Current scenario:

We have a high throughput kinesis stream which is consumed by a KCL
> consumer fleet. Each shard-consumer in the fleet will broadcast a consumed
> message to java's fixed-thread-pool-executor-service where each thread/task
> will enrich, filter, transform and sink the message to a downstream.
>


High level flow:
>                   Kinesis => Kinesis KCL => In Proc fan-out => downstream
> kinesis streams => Flink => downstream
>


As you can imagine fan-out using threadpool provides poor resource
> isolation, so when one thread takes a long time others are also kept
> waiting until all threads have processed a given message. Furthermore, we
> cannot share transformations applied before Flink with the consumer fleet
> routing/transform logic. This adds to the E2E event time latency after
> Flink delivers final computation. outing application state is handled by
> default KCL persistent offset management.
>


Attached dataflow diagram for illustration.



Thanks,
Shunya