You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/01/12 12:49:00 UTC

[GitHub] [flink-web] qinjunjerry commented on a change in pull request #402: Add the rocksdb blog

qinjunjerry commented on a change in pull request #402:
URL: https://github.com/apache/flink-web/pull/402#discussion_r555745655



##########
File path: _posts/2020-12-20-rocksdb.md
##########
@@ -0,0 +1,121 @@
+---
+layout: post
+title:  "Using RocksDB State Backend in Apache Flink: When and How"
+date:   2020-12-20 00:00:00
+authors:
+- Jun Qin:
+  name: "Jun Qin"
+excerpt: This blog post will guide you through the benefits of using RocksDB to manage your application’s state, explain when and how to use it and also clear up a few common misconceptions.  
+---
+
+[Stream processing](https://en.wikipedia.org/wiki/Stream_processing) applications are often stateful, “remembering” information from processed events and using it to influence further event processing. In Flink, the remembered information, i.e., state, is stored locally in the configured state backend. To prevent data loss in case of failures, the state backend periodically persists a snapshot of its contents to a pre-configured durable storage. The [RocksDB](https://rocksdb.org/) state backend (i.e., RocksDBStateBackend) is one of the three built-in state backends in Flink. This blog post will guide you through the benefits of using RocksDB to manage your application’s state, explain when and how to use it and also clear up a few common misconceptions. Having said that, this is **not** a blog post to explain how RocksDB works in-depth or how to do advanced troubleshooting and performance tuning; if you need help with any of those topics, you can reach out to the [Flink User M
 ailing List](https://flink.apache.org/community.html#mailing-lists). 
+
+# State in Flink
+
+To best understand state and state backends in Flink, it’s important to distinguish between **in-flight state** and **state snapshots**. In-flight state, also known as working state, is the state a Flink job is working on. It is always stored locally in memory (with the possibility to spill to disks) and can be lost when jobs fail without impacting job recoverability. State snapshots, i.e., [checkpoints](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html) and [savepoints](https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint), are stored in a remote durable storage, and are used to restore the local state in the case of job failures. The appropriate state backend for a production deployment depends on scalability, throughput, and latency requirements. 
+
+# What is RocksDB?
+
+Thinking of RocksDB as a distributed database that needs to run on a cluster and to be managed by specialized administrators is a common misconception.  RocksDB is an embeddable persistent key-value store for fast storage. It interacts with Flink via the Java Native Interface (JNI). The picture below shows where RocksDB fits in a Flink cluster node. Details are explained in the following sections.
+
+
+<center>
+<img vspace="8" style="width:60%" src="{{site.baseurl}}/img/blog/2020-12-20-rocksdb/RocksDB-in-Flink.png" />
+</center>
+
+
+# RocksDB in Flink
+
+Everything you need to use RocksDB as a state backend is bundled in the Apache Flink distribution, including the native shared library:
+
+    $ jar -tvf lib/flink-dist_2.12-1.12.0.jar| grep librocksdbjni-linux64
+    8695334 Wed Nov 27 02:27:06 CET 2019 librocksdbjni-linux64.so
+
+At runtime, RocksDB is embedded in the TaskManager processes. It runs in native threads and works with local files. For example, if you have a job configured with RocksDBStateBackend running in your Flink cluster, you’ll see something similar to the following, where 32513 is the TaskManager process ID.
+
+    $ ps -T -p 32513 | grep -i rocksdb
+    32513 32633 ?        00:00:00 rocksdb:low0
+    32513 32634 ?        00:00:00 rocksdb:high0
+
+<div class="alert alert-info" markdown="1">
+<span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+The command is for Linux only. For other operating systems, please refer to their documentation.

Review comment:
       The command does not work on MacOS. It may work on other unix. But I never tried. I think it is safe to say "Linux".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org