You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/04/03 17:59:16 UTC

[GitHub] [flink-statefun] sjwiesman commented on a change in pull request #95: [FLINK-16977][docs] Add docs for components and distributed architecture

sjwiesman commented on a change in pull request #95: [FLINK-16977][docs] Add docs for components and distributed architecture
URL: https://github.com/apache/flink-statefun/pull/95#discussion_r403196811
 
 

 ##########
 File path: docs/concepts/distributed_architecture.md
 ##########
 @@ -0,0 +1,99 @@
+---
+title: Distributed Architecture 
+nav-id: dist-arch
+nav-pos: 2
+nav-title: Architecture
+nav-parent_id: concepts
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+A Stateful Functions deployment consists of a few components interacting together. Here we describe these pieces and their 
+
+* This will be replaced by the TOC
+{:toc}
+
+## High-level View
+
+A *Stateful Functions* deployment consists of a set of **Apache Flink Stateful Functions** processes and, optionally, various deployments that execute remote functions.
+
+<p class="text-center">
+    <img width="80%" src="{{ site.baseurl }}/fig/concepts/arch_overview.svg"/>
+</p>
+
+The Flink worker processes (TaskManagers) receive the events from the ingress systems (Kafka, Kinesis, etc.) and route them to the target functions. They invoke the functions, and route the resulting mesages to the next respective target functions. Messages designated for egress are written to an egress system (again, Kafka, Kinesis, ...).
+
+## Components
+
+The heavy lifting is done by the Apache Flink processes, which manage the state, handle the messaging, and invoke the stateful functions.
+The Flink cluster consists typically of one master and multiple workers (TaskManagers).
+
+<p class="text-center">
+    <img width="80%" src="{{ site.baseurl }}/fig/concepts/arch_components.svg"/>
+</p>
+
+In addition to the Apache Flink processes, a full deployment requires [ZooKeeper](https://zookeeper.apache.org/) (for [master failover](https://ci.apache.org/projects/flink/flink-docs-stable/ops/jobmanager_high_availability.html)) and bulk storage (S3, HDFS, NAS, GCS, Azure Blob Store, etc.) to store Flink's [checkpoints](https://ci.apache.org/projects/flink/flink-docs-master/concepts/stateful-stream-processing.html#checkpointing). In turn, the deployment requires no database, and Flink processes do not require persistent volumes.
+
+## Logical Co-location, Physical Separation
+
+A core principle of many Stream Processors is that application logic and the application state must be co-located. That approach is the basis for their out-of-the box consistency. Stateful Function takes a unique approach to that by *logically co-locating* state and compute, but allowing to *physically separate* them.
+
+  - *Logical co-location:* Messaging, state access/updates and function invocations are managed tightly together, in the same way as in Flink's DataStream API. State is sharded by key, and messages are routed to the state by key. There is a single writer per key at a time, also scheduling the function invocations.
+
+  - *Physical separation:* Functions can be executed remotely, with message and state access provided as part of the invocation request. This way, functions can be managed independently, like stateless processes.
+
+
+## Deployment Styles for Functions
+
+The stateful functions themselves can be deployed in various ways that trade off certain properties with each other: loose coupling and independent scaling on the one hand with performance overhead on the other hand. Each module of functions can be of a different kind, so some functions can run remote, while others could run embedded.
+
+#### Remote Functions
+
+*Remote Functions* use the above-mentioned principle of *physical separation* while maintaining *logical co-location*. The state/messaging tier (i.e., the Flink processes) and the function tier are deployed, managed, and scaled independently.
+
+Function invocations happen through an HTTP / gRPC protocol and go through a service that routes invocation requests to any available endpoint, for example a Kubernetes (load-balancing) service, the AWS request gateway for Lambda, etc. Because invocations are self-contained (contain message, state, access to timers, etc.) the target functions can treated like any stateless application.
+
+<p class="text-center">
+	<img width="80%" src="{{ site.baseurl }}/fig/concepts/arch_funs_remote.svg"/>
+</p>
+
+
+Refer to the documentation on the [Python SDK]({{ site.baseurl }}/sdk/python.html) and [remote modules]({{ site.baseurl }}/sdk/modules.html#remote-module) for details. 
+
+#### Co-located Functions
+
+An alternative way of deploying functions is *co-location* with the Flink JVM processes. In such a setup, each Flink TaskManager would talk to one Function process sitting *"next to it"*. A common way to do this is to use a system like Kubernetes and deploy pods consisting of a Flink container and the function side-car container; the two communicate via the pod-local network.
+
+This mode supports different languages while avoiding to route invocations through a Service/LoadBalancer, but it cannot scale the state and compute parts independently.
+
+<p class="text-center">
+	<img width="80%" src="{{ site.baseurl }}/fig/concepts/arch_funs_colocated.svg"/>
+</p>
+
+This style of deployment is similar to how Flink's Table API and API Beam's portability layer and deploy execute non-JVM functions.
 
 Review comment:
   ```suggestion
   This style of deployment is similar to how Flink's Table API and API Beam's portability layer deploy and execute non-JVM functions.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services