You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@flink.apache.org by nk...@apache.org on 2019/07/23 15:48:16 UTC

[flink-web] branch asf-site updated (a0c3fab -> 606b5df)

This is an automated email from the ASF dual-hosted git repository.

nkruber pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git.


    from a0c3fab  Add zhijiang to community page
     new 0266dc9  [hotfix] use site.baseurl and site.DOCS_BASE_URL instead of manual URLs in 2019 posts
     new 0645f54  [hotfix] remove incremental build option
     new 9dba6e8  [Blog] style-tuning for Network Stack Vol. 1
     new fb57c4a  [Blog] add Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing
     new 606b5df  Rebuild website

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 _posts/2019-02-15-release-1.7.2.md                 |   2 +-
 _posts/2019-02-21-monitoring-best-practices.md     |  18 +-
 _posts/2019-02-25-release-1.6.4.md                 |   2 +-
 _posts/2019-03-11-prometheus-monitoring.md         |  10 +-
 _posts/2019-04-09-release-1.8.0.md                 |  14 +-
 _posts/2019-04-17-sod.md                           |  24 +-
 _posts/2019-05-03-pulsar-flink.md                  |   4 +-
 _posts/2019-05-14-temporal-tables.md               |   8 +-
 _posts/2019-05-17-state-ttl.md                     |   4 +-
 _posts/2019-06-05-flink-network-stack.md           | 177 ++++---
 _posts/2019-06-26-broadcast-state.md               |   2 +-
 _posts/2019-07-02-release-1.8.1.md                 |   2 +-
 _posts/2019-07-23-flink-network-stack-2.md         | 315 +++++++++++
 build.sh                                           |   4 -
 content/2019/05/03/pulsar-flink.html               |   4 +-
 content/2019/05/14/temporal-tables.html            |   2 +-
 content/2019/06/05/flink-network-stack.html        | 174 ++++---
 content/2019/07/23/flink-network-stack-2.html      | 579 +++++++++++++++++++++
 content/blog/feed.xml                              | 556 ++++++++++++++++----
 content/blog/index.html                            |  36 +-
 content/blog/page2/index.html                      |  38 +-
 content/blog/page3/index.html                      |  38 +-
 content/blog/page4/index.html                      |  38 +-
 content/blog/page5/index.html                      |  40 +-
 content/blog/page6/index.html                      |  40 +-
 content/blog/page7/index.html                      |  40 +-
 content/blog/page8/index.html                      |  40 +-
 content/blog/page9/index.html                      |  25 +
 content/css/flink.css                              |   5 +
 .../back_pressure_sampling_high.png                | Bin 0 -> 77546 bytes
 content/index.html                                 |   6 +-
 content/news/2019/02/15/release-1.7.2.html         |   2 +-
 content/news/2019/02/25/release-1.6.4.html         |   2 +-
 content/news/2019/04/09/release-1.8.0.html         |   8 +-
 content/news/2019/07/02/release-1.8.1.html         |   2 +-
 content/roadmap.html                               |   4 +-
 content/zh/community.html                          |   6 +
 content/zh/index.html                              |   6 +-
 css/flink.css                                      |   5 +
 .../back_pressure_sampling_high.png                | Bin 0 -> 77546 bytes
 40 files changed, 1843 insertions(+), 439 deletions(-)
 create mode 100644 _posts/2019-07-23-flink-network-stack-2.md
 create mode 100644 content/2019/07/23/flink-network-stack-2.html
 create mode 100644 content/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png
 create mode 100644 img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png

[flink-web] 01/05: [hotfix] use site.baseurl and site.DOCS_BASE_URL instead of manual URLs in 2019 posts

Posted by nk...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

nkruber pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit 0266dc92dc246b6ca2f768c8c965d26e597ba168
Author: Nico Kruber <ni...@ververica.com>
AuthorDate: Wed Jul 17 11:46:36 2019 +0200

    [hotfix] use site.baseurl and site.DOCS_BASE_URL instead of manual URLs in 2019 posts
    
    This replaces most uses of http[s]://flink.apache.org with {{ site.baseurl }}
    and http[s]://ci.apache.org/projects/flink/ with {{ site.DOCS_BASE_URL }},
    allowing a better local build, smaller .md files, and safer URLs.
---
 _posts/2019-02-15-release-1.7.2.md             |  2 +-
 _posts/2019-02-21-monitoring-best-practices.md | 18 +++++++++---------
 _posts/2019-02-25-release-1.6.4.md             |  2 +-
 _posts/2019-03-11-prometheus-monitoring.md     | 10 +++++-----
 _posts/2019-04-09-release-1.8.0.md             | 14 +++++++-------
 _posts/2019-04-17-sod.md                       | 24 ++++++++++++------------
 _posts/2019-05-03-pulsar-flink.md              |  4 ++--
 _posts/2019-05-14-temporal-tables.md           |  8 ++++----
 _posts/2019-05-17-state-ttl.md                 |  4 ++--
 _posts/2019-06-05-flink-network-stack.md       | 26 +++++++++++++-------------
 _posts/2019-06-26-broadcast-state.md           |  2 +-
 _posts/2019-07-02-release-1.8.1.md             |  2 +-
 content/2019/05/03/pulsar-flink.html           |  4 ++--
 content/2019/05/14/temporal-tables.html        |  2 +-
 content/2019/06/05/flink-network-stack.html    |  2 +-
 content/blog/feed.xml                          | 22 +++++++++++-----------
 content/news/2019/02/15/release-1.7.2.html     |  2 +-
 content/news/2019/02/25/release-1.6.4.html     |  2 +-
 content/news/2019/04/09/release-1.8.0.html     |  8 ++++----
 content/news/2019/07/02/release-1.8.1.html     |  2 +-
 20 files changed, 80 insertions(+), 80 deletions(-)

diff --git a/_posts/2019-02-15-release-1.7.2.md b/_posts/2019-02-15-release-1.7.2.md
index be5b0f6..a2df9ca 100644
--- a/_posts/2019-02-15-release-1.7.2.md
+++ b/_posts/2019-02-15-release-1.7.2.md
@@ -33,7 +33,7 @@ Updated Maven dependencies:
 </dependency>
 ```
 
-You can find the binaries on the updated [Downloads page](http://flink.apache.org/downloads.html).
+You can find the binaries on the updated [Downloads page]({{ site.baseurl }}/downloads.html).
 
 List of resolved issues:
 
diff --git a/_posts/2019-02-21-monitoring-best-practices.md b/_posts/2019-02-21-monitoring-best-practices.md
index 153625f..f7b4ad2 100644
--- a/_posts/2019-02-21-monitoring-best-practices.md
+++ b/_posts/2019-02-21-monitoring-best-practices.md
@@ -40,7 +40,7 @@ any given point in time.
 ## Flink’s Metrics System
 
 The foundation for monitoring Flink jobs is its [metrics
-system](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html>)
+system](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html>)
 which consists of two components; `Metrics` and `MetricsReporters`.
 
 ### Metrics
@@ -61,7 +61,7 @@ the number of records temporarily buffered in managed state. Besides counters,
 Flink offers additional metrics types like gauges and histograms. For
 instructions on how to register your own metrics with Flink’s metrics system
 please check out [Flink’s
-documentation](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#registering-metrics>).
+documentation](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html#registering-metrics>).
 In this blog post, we will focus on how to get the most out of Flink’s built-in
 metrics.
 
@@ -72,7 +72,7 @@ MetricsReporters to send the metrics to external systems. Apache Flink provides
 reporters to the most common monitoring tools out-of-the-box including JMX,
 Prometheus, Datadog, Graphite and InfluxDB. For information about how to
 configure a reporter check out Flink’s [MetricsReporter
-documentation](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#reporter>).
+documentation](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html#reporter>).
 
 In the remaining part of this blog post, we will go over some of the most
 important metrics to monitor your Apache Flink application.
@@ -132,7 +132,7 @@ keeping up with the upstream systems.
 
 Flink provides multiple metrics to measure the throughput of our application.
 For each operator or task (remember: a task can contain multiple [chained
-tasks](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/#task-chaining-and-resource-groups>)
+tasks](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/dev/stream/operators/#task-chaining-and-resource-groups>)
 Flink counts the number of records and bytes going in and out. Out of those
 metrics, the rate of outgoing records per operator is often the most intuitive
 and easiest to reason about.
@@ -261,7 +261,7 @@ inside the Flink topology and cannot be attributed to transactional sinks or
 events being buffered for functional reasons (4.).
 
 To this end, Flink comes with a feature called [Latency
-Tracking](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#latency-tracking>).
+Tracking](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html#latency-tracking>).
 When enabled, Flink will insert so-called latency markers periodically at all
 sources. For each sub-task, a latency distribution from each source to this
 operator will be reported. The granularity of these histograms can be further
@@ -309,7 +309,7 @@ metric to watch. This is especially true when using Flink’s filesystem
 statebackend as it keeps all state objects on the JVM Heap. If the size of
 long-living objects on the Heap increases significantly, this can usually be
 attributed to the size of your application state (check the 
-[checkpointing metrics](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#checkpointing>)
+[checkpointing metrics](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html#checkpointing>)
 for an estimated size of the on-heap state). The possible reasons for growing
 state are very application-specific. Typically, an increasing number of keys, a
 large event-time skew between different input streams or simply missing state
@@ -322,7 +322,7 @@ to 250 megabyte by default.
 
 * The biggest driver of Direct memory is by far the
 number of Flink’s network buffers, which can be
-[configured](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#configuring-the-network-buffers>).
+[configured](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/ops/config.html#configuring-the-network-buffers>).
 
 * Mapped memory is usually close to zero as Flink does not use memory-mapped files.
 
@@ -414,7 +414,7 @@ system to gather insights about system resources, i.e. memory, CPU &
 network-related metrics for the whole machine as opposed to the Flink processes
 alone. System resource monitoring is disabled by default and requires additional
 dependencies on the classpath. Please check out the 
-[Flink system resource metrics documentation](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#system-resources>) for
+[Flink system resource metrics documentation](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html#system-resources>) for
 additional guidance and details. System resource monitoring in Flink can be very
 helpful in setups without existing host monitoring capabilities.
 
@@ -432,5 +432,5 @@ Flink’s internals early on.
 
 Last but not least, this post only scratches the surface of the overall metrics
 and monitoring capabilities of Apache Flink. I highly recommend going over
-[Flink’s metrics documentation](<https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html>)
+[Flink’s metrics documentation](<{{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html>)
 for a full reference of Flink’s metrics system.
\ No newline at end of file
diff --git a/_posts/2019-02-25-release-1.6.4.md b/_posts/2019-02-25-release-1.6.4.md
index 5fc54b7..dd46e29 100644
--- a/_posts/2019-02-25-release-1.6.4.md
+++ b/_posts/2019-02-25-release-1.6.4.md
@@ -31,7 +31,7 @@ Updated Maven dependencies:
 </dependency>
 ```
 
-You can find the binaries on the updated [Downloads page](http://flink.apache.org/downloads.html).
+You can find the binaries on the updated [Downloads page]({{ site.baseurl }}/downloads.html).
 
 List of resolved issues:
 
diff --git a/_posts/2019-03-11-prometheus-monitoring.md b/_posts/2019-03-11-prometheus-monitoring.md
index dc47eb9..20f554f 100644
--- a/_posts/2019-03-11-prometheus-monitoring.md
+++ b/_posts/2019-03-11-prometheus-monitoring.md
@@ -10,7 +10,7 @@ category: features
 excerpt: This blog post describes how developers can leverage Apache Flink's built-in metrics system together with Prometheus to observe and monitor streaming applications in an effective way.
 ---
 
-This blog post describes how developers can leverage Apache Flink's built-in [metrics system](https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html) together with [Prometheus](https://prometheus.io/) to observe and monitor streaming applications in an effective way. This is a follow-up post from my [Flink Forward](https://flink-forward.org/) Berlin 2018 talk ([slides](https://www.slideshare.net/MaximilianBode1/monitoring-flink-with-prometheus), [video](https [...]
+This blog post describes how developers can leverage Apache Flink's built-in [metrics system]({{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html) together with [Prometheus](https://prometheus.io/) to observe and monitor streaming applications in an effective way. This is a follow-up post from my [Flink Forward](https://flink-forward.org/) Berlin 2018 talk ([slides](https://www.slideshare.net/MaximilianBode1/monitoring-flink-with-prometheus), [video](https://www.verver [...]
 
 ## Why Prometheus?
 
@@ -24,7 +24,7 @@ Prometheus is a metrics-based monitoring system that was originally created in 2
 
 * **PromQL** is Prometheus' [query language](https://prometheus.io/docs/prometheus/latest/querying/basics/). It can be used for both building dashboards and setting up alert rules that will trigger when specific conditions are met.
 
-When considering metrics and monitoring systems for your Flink jobs, there are many [options](https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html). Flink offers native support for exposing data to Prometheus via the `PrometheusReporter` configuration. Setting up this integration is very easy.
+When considering metrics and monitoring systems for your Flink jobs, there are many [options]({{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html). Flink offers native support for exposing data to Prometheus via the `PrometheusReporter` configuration. Setting up this integration is very easy.
 
 Prometheus is a great choice as usually Flink jobs are not running in isolation but in a greater context of microservices. For making metrics available to Prometheus from other parts of a larger system, there are two options: There exist [libraries for all major languages](https://prometheus.io/docs/instrumenting/clientlibs/) to instrument other applications. Additionally, there is a wide variety of [exporters](https://prometheus.io/docs/instrumenting/exporters/), which are tools that ex [...]
 
@@ -36,7 +36,7 @@ We have provided a [GitHub repository](https://github.com/mbode/flink-prometheus
 ./gradlew composeUp
 ```
 
-This builds a Flink job using the build tool [Gradle](https://gradle.org/) and starts up a local environment based on [Docker Compose](https://docs.docker.com/compose/) running the job in a [Flink job cluster](https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/docker.html#flink-job-cluster) (reachable at [http://localhost:8081](http://localhost:8081/)) as well as a Prometheus instance ([http://localhost:9090](http://localhost:9090/)).
+This builds a Flink job using the build tool [Gradle](https://gradle.org/) and starts up a local environment based on [Docker Compose](https://docs.docker.com/compose/) running the job in a [Flink job cluster]({{ site.DOCS_BASE_URL }}flink-docs-release-1.7/ops/deployment/docker.html#flink-job-cluster) (reachable at [http://localhost:8081](http://localhost:8081/)) as well as a Prometheus instance ([http://localhost:9090](http://localhost:9090/)).
 
 <center>
 <img src="{{ site.baseurl }}/img/blog/2019-03-11-prometheus-monitoring/prometheusexamplejob.png" width="600px" alt="PrometheusExampleJob in Flink Web UI"/>
@@ -73,7 +73,7 @@ To start monitoring Flink with Prometheus, the following steps are necessary:
 
         cp /opt/flink/opt/flink-metrics-prometheus-1.7.2.jar /opt/flink/lib
 
-2. [Configure the reporter](https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#reporter) in Flink's _flink-conf.yaml_. All job managers and task managers will expose the metrics on the configured port.
+2. [Configure the reporter]({{ site.DOCS_BASE_URL }}flink-docs-release-1.7/monitoring/metrics.html#reporter) in Flink's _flink-conf.yaml_. All job managers and task managers will expose the metrics on the configured port.
 
         metrics.reporters: prom
         metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
@@ -105,7 +105,7 @@ To test Prometheus' alerting feature, kill one of the Flink task managers via
 docker kill taskmanager1
 ```
 
-Our Flink job can recover from this partial failure via the mechanism of [Checkpointing](https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/checkpointing.html). Nevertheless, after roughly one minute (as configured in the alert rule) the following alert will fire:
+Our Flink job can recover from this partial failure via the mechanism of [Checkpointing]({{ site.DOCS_BASE_URL }}flink-docs-release-1.7/dev/stream/state/checkpointing.html). Nevertheless, after roughly one minute (as configured in the alert rule) the following alert will fire:
 
 <center>
 <img src="{{ site.baseurl }}/img/blog/2019-03-11-prometheus-monitoring/prometheusalerts.png" width="600px" alt="Prometheus web UI with example alert"/>
diff --git a/_posts/2019-04-09-release-1.8.0.md b/_posts/2019-04-09-release-1.8.0.md
index 6ff8107..b072bf6 100644
--- a/_posts/2019-04-09-release-1.8.0.md
+++ b/_posts/2019-04-09-release-1.8.0.md
@@ -17,15 +17,15 @@ for more details.
 
 Flink 1.8.0 is API-compatible with previous 1.x.y releases for APIs annotated
 with the `@Public` annotation.  The release is available now and we encourage
-everyone to [download the release](http://flink.apache.org/downloads.html) and
+everyone to [download the release]({{ site.baseurl }}/downloads.html) and
 check out the updated
-[documentation](https://ci.apache.org/projects/flink/flink-docs-release-1.8/).
+[documentation]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/).
 Feedback through the Flink [mailing
-lists](http://flink.apache.org/community.html#mailing-lists) or
+lists]({{ site.baseurl }}/community.html#mailing-lists) or
 [JIRA](https://issues.apache.org/jira/projects/FLINK/summary) is, as always,
 very much appreciated!
 
-You can find the binaries on the updated [Downloads page](http://flink.apache.org/downloads.html) on the Flink project site.
+You can find the binaries on the updated [Downloads page]({{ site.baseurl }}/downloads.html) on the Flink project site.
 
 {% toc %}
 
@@ -43,7 +43,7 @@ addition of the Blink enhancements
 Nevertheless, this release includes some important new features and bug fixes.
 The most interesting of those are highlighted below. Please consult the
 [complete changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274)
-and the [release notes](https://ci.apache.org/projects/flink/flink-docs-release-1.8/release-notes/flink-1.8.html)
+and the [release notes]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/release-notes/flink-1.8.html)
 for more details.
 
 
@@ -190,7 +190,7 @@ for more details.
   If a deployment relies on `flink-shaded-hadoop2` being included in
   `flink-dist`, then you must manually download a pre-packaged Hadoop
   jar from the optional components section of the [download
-  page](https://flink.apache.org/downloads.html) and copy it into the
+  page]({{ site.baseurl }}/downloads.html) and copy it into the
   `/lib` directory.  Alternatively, a Flink distribution that includes
   hadoop can be built by packaging `flink-dist` and activating the
   `include-hadoop` maven profile.
@@ -239,7 +239,7 @@ for more details.
 ## Release Notes
 
 Please review the [release
-notes](https://ci.apache.org/projects/flink/flink-docs-release-1.8/release-notes/flink-1.8.html)
+notes]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/release-notes/flink-1.8.html)
 for a more detailed list of changes and new features if you plan to upgrade
 your Flink setup to Flink 1.8.
 
diff --git a/_posts/2019-04-17-sod.md b/_posts/2019-04-17-sod.md
index 301d211..5b81ba6 100644
--- a/_posts/2019-04-17-sod.md
+++ b/_posts/2019-04-17-sod.md
@@ -9,7 +9,7 @@ authors:
   twitter: "snntrable"
 ---
 
-The Apache Flink community is happy to announce its application to the first edition of [Season of Docs](https://developers.google.com/season-of-docs/) by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or two technical writers t [...]
+The Apache Flink community is happy to announce its application to the first edition of [Season of Docs](https://developers.google.com/season-of-docs/) by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or two technical writers t [...]
 
 The community has discussed this opportunity on the [dev mailinglist](https://lists.apache.org/thread.html/3c789b6187da23ad158df59bbc598543b652e3cfc1010a14e294e16a@%3Cdev.flink.apache.org%3E) and agreed on three project ideas to submit to the program. We have a great team of mentors (Stephan, Fabian, David, Jark & Konstantin) lined up and are very much looking forward to the first proposals by potential technical writers (given we are admitted to the program ;)). In case of questions fee [...]
 
@@ -24,11 +24,11 @@ In this project, we would like to restructure, consolidate and extend the concep
 
 **Related material:**
 
-1. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/](https://ci.apache.org/projects/flink/flink-docs-release-1.8/)
-2. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev)
-3. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops](https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops)
-4. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html#time](https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html#time)
-5. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html)
+1. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/)
+2. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev)
+3. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops)
+4. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/concepts/programming-model.html#time]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/concepts/programming-model.html#time)
+5. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/event_time.html]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/event_time.html)
 
 ### Project 2: Improve Documentation of Flink Deployments & Operations
 
@@ -39,8 +39,8 @@ In this project, we would like to restructure this part of the documentation and
 
 **Related material:**
 
-1. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops](https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/)
-2. [https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring](https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring)
+1. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/)
+2. [{{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring)
 
 ### Project 3: Improve Documentation for Relational APIs (Table API & SQL)
 
@@ -51,8 +51,8 @@ The existing documentation could be reorganized to prepare for covering the new
 
 **Related material:**
 
-1. [Table API & SQL docs main page](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table)
-2. [Built-in functions](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html)
-3. [Concepts](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html)
-4. [Streaming Concepts](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/)
+1. [Table API & SQL docs main page]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table)
+2. [Built-in functions]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table/functions.html)
+3. [Concepts]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table/common.html)
+4. [Streaming Concepts]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table/streaming/)
 
diff --git a/_posts/2019-05-03-pulsar-flink.md b/_posts/2019-05-03-pulsar-flink.md
index b9059e9..14e0365 100644
--- a/_posts/2019-05-03-pulsar-flink.md
+++ b/_posts/2019-05-03-pulsar-flink.md
@@ -41,7 +41,7 @@ Finally, Pulsar’s flexible messaging framework unifies the streaming and queui
 
 ## Pulsar’s view on data: Segmented data streams
 
-Apache Flink is a streaming-first computation framework that perceives [batch processing as a special case of streaming](https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html). Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.
+Apache Flink is a streaming-first computation framework that perceives [batch processing as a special case of streaming]({{ site.baseurl }}/news/2019/02/13/unified-batch-streaming-blink.html). Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.
 
 Apache Pulsar has a similar perspective to that of Apache Flink with regards to the data layer. The framework also uses streams as a unified view on all data, while its layered architecture allows traditional pub-sub messaging for streaming workloads and continuous data processing or usage of *Segmented Streams* and bounded data stream for batch and static workloads. 
 
@@ -155,4 +155,4 @@ wc.output(pulsarOutputFormat);
 
 ## Conclusion
 
-Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be *“streaming-first”* with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the [Apache Flink](https://flink.apache.org/community.html#m [...]
+Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be *“streaming-first”* with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the [Apache Flink]({{ site.baseurl }}/community.html#mailing [...]
diff --git a/_posts/2019-05-14-temporal-tables.md b/_posts/2019-05-14-temporal-tables.md
index d630807..432765a 100644
--- a/_posts/2019-05-14-temporal-tables.md
+++ b/_posts/2019-05-14-temporal-tables.md
@@ -27,7 +27,7 @@ In the 1.7 release, Flink has introduced the concept of **temporal tables** into
 
 * Exposing the stream as a **temporal table function** that maps each point in time to a static relation.
 
-Going back to our example use case, a temporal table is just what we need to model the conversion rate data such as to make it useful for point-in-time querying. Temporal table functions are implemented as an extension of Flink’s generic [table function](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/udfs.html#table-functions) class and can be defined in the same straightforward way to be used with the Table API or SQL parser.
+Going back to our example use case, a temporal table is just what we need to model the conversion rate data such as to make it useful for point-in-time querying. Temporal table functions are implemented as an extension of Flink’s generic [table function]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table/udfs.html#table-functions) class and can be defined in the same straightforward way to be used with the Table API or SQL parser.
 
 ```java
 import org.apache.flink.table.functions.TemporalTableFunction;
@@ -97,10 +97,10 @@ Each record from the append-only table on the probe side (```Taxi Fare```) is jo
 </center>
 <br>
 
-Temporal table joins support both [processing](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/joins.html#processing-time-temporal-joins) and [event time](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/joins.html#event-time-temporal-joins) semantics and effectively limit the amount of data kept in state while also allowing records on the build side to be arbitrarily old, as opposed to time-windowed joins. Probe-side records [...]
+Temporal table joins support both [processing]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table/streaming/joins.html#processing-time-temporal-joins) and [event time]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/table/streaming/joins.html#event-time-temporal-joins) semantics and effectively limit the amount of data kept in state while also allowing records on the build side to be arbitrarily old, as opposed to time-windowed joins. Probe-side records only need to be kept in s [...]
 
 * Narrowing the **scope** of the join: only the time-matching version of ```ratesHistory``` is visible for a given ```taxiFare.time```;
-* Pruning **unneeded records** from state: for cases using event time, records between current time and the [watermark](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html#event-time-and-watermarks) delay are persisted for both the probe and build side. These are discarded as soon as the watermark arrives and the results are emitted — allowing the join operation to move forward in time and the build table to “refresh” its version in state.
+* Pruning **unneeded records** from state: for cases using event time, records between current time and the [watermark]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/event_time.html#event-time-and-watermarks) delay are persisted for both the probe and build side. These are discarded as soon as the watermark arrives and the results are emitted — allowing the join operation to move forward in time and the build table to “refresh” its version in state.
 
 ## Conclusion
 
@@ -108,4 +108,4 @@ All this means it is now possible to express continuous stream enrichment in rel
 
 If you'd like to get some **hands-on practice in joining streams with Flink SQL** (and Flink SQL in general), checkout this [free training for Flink SQL](https://github.com/ververica/sql-training/wiki). The training environment is based on Docker and set up in just a few minutes.
 
-Subscribe to the [Apache Flink mailing lists](https://flink.apache.org/community.html#mailing-lists) to stay up-to-date with the latest developments in this space.
+Subscribe to the [Apache Flink mailing lists]({{ site.baseurl }}/community.html#mailing-lists) to stay up-to-date with the latest developments in this space.
diff --git a/_posts/2019-05-17-state-ttl.md b/_posts/2019-05-17-state-ttl.md
index 592f180..a53e7d4 100644
--- a/_posts/2019-05-17-state-ttl.md
+++ b/_posts/2019-05-17-state-ttl.md
@@ -33,7 +33,7 @@ Both requirements can be addressed by a feature that periodically, yet continuou
 
 The 1.6.0 release of Apache Flink introduced the State TTL feature. It enabled developers of stream processing applications to configure the state of operators to expire and be cleaned up after a defined timeout (time-to-live). In Flink 1.8.0 the feature was extended, including continuous cleanup of old entries for both the RocksDB and the heap state backends (FSStateBackend and MemoryStateBackend), enabling a continuous cleanup process of old entries (according to the TTL setting).
 
-In Flink’s DataStream API, application state is defined by a [state descriptor](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/state.html#using-managed-keyed-state). State TTL is configured by passing a `StateTtlConfiguration` object to a state descriptor. The following Java example shows how to create a state TTL configuration and provide it to the state descriptor that holds the last login time of a user as a `Long` value:
+In Flink’s DataStream API, application state is defined by a [state descriptor]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/stream/state/state.html#using-managed-keyed-state). State TTL is configured by passing a `StateTtlConfiguration` object to a state descriptor. The following Java example shows how to create a state TTL configuration and provide it to the state descriptor that holds the last login time of a user as a `Long` value:
 
 ```java
 import org.apache.flink.api.common.state.StateTtlConfig;
@@ -63,7 +63,7 @@ State TTL employs a lazy strategy to clean up expired state. This can lead to th
 * **Which time semantics are used for the Time-to-Live timers?** 
 With Flink 1.8.0, users can only define a state TTL in terms of processing time. The support for event time is planned for future Apache Flink releases.
 
-You can read more about how to use state TTL in the [Apache Flink documentation](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl).
+You can read more about how to use state TTL in the [Apache Flink documentation]({{ site.DOCS_BASE_URL }}flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl).
 
 Internally, the State TTL feature is implemented by storing an additional timestamp of the last relevant state access, along with the actual state value. While this approach adds some storage overhead, it allows Flink to check for the expired state during state access, checkpointing, recovery, or dedicated storage cleanup procedures.
 
diff --git a/_posts/2019-06-05-flink-network-stack.md b/_posts/2019-06-05-flink-network-stack.md
index 0888a15..6e26fbd 100644
--- a/_posts/2019-06-05-flink-network-stack.md
+++ b/_posts/2019-06-05-flink-network-stack.md
@@ -97,17 +97,17 @@ The following table summarises the valid combinations:
 
 
 <sup>1</sup> Currently not used by Flink. <br>
-<sup>2</sup> This may become applicable to streaming jobs once the [Batch/Streaming unification](https://flink.apache.org/roadmap.html#batch-and-streaming-unification) is done.
+<sup>2</sup> This may become applicable to streaming jobs once the [Batch/Streaming unification]({{ site.baseurl }}/roadmap.html#batch-and-streaming-unification) is done.
 
 
 <br>
-Additionally, for subtasks with more than one input, scheduling start in two ways: after *all* or after *any* input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at [ExecutionConfig#setExecutionMode()](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionMode-) - and [Exe [...]
+Additionally, for subtasks with more than one input, scheduling start in two ways: after *all* or after *any* input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at [ExecutionConfig#setExecutionMode()]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionMode-) - and [ExecutionMode]({ [...]
 
 <br>
 
 # Physical Transport
 
-In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via [slot sharing groups](https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups). TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.
+In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via [slot sharing groups]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups). TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.
 
 For the example pictured below, we will assume a parallelism of 4 and a deployment with two task managers offering 2 slots each. TaskManager 1 executes subtasks A.1, A.2, B.1, and B.2 and TaskManager 2 executes subtasks A.3, A.4, B.3, and B.4. In a shuffle-type connection between task A and task B, for example from a `keyBy()`, there are 2x4 logical connections to handle on each TaskManager, some of which are local, some remote:
 <br>
@@ -158,11 +158,11 @@ Each (remote) network connection between different tasks will get its own TCP ch
 </center>
 <br>
 
-The results of each subtask are called [ResultPartition](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultPartition.html), each split into separate [ResultSubpartitions](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultSubpartition.html) — one for each logical channel. At this point in the stack, Flink is not dealing with individual records anymo [...]
+The results of each subtask are called [ResultPartition]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultPartition.html), each split into separate [ResultSubpartitions]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultSubpartition.html) — one for each logical channel. At this point in the stack, Flink is not dealing with individual records anymore but instead with a grou [...]
 
     #channels * buffers-per-channel + floating-buffers-per-gate
 
-The total number of buffers on a single TaskManager usually does not need configuration. See the [Configuring the Network Buffers](https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers) documentation for details on how to do so if needed.
+The total number of buffers on a single TaskManager usually does not need configuration. See the [Configuring the Network Buffers]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers) documentation for details on how to do so if needed.
 
 ## Inflicting Backpressure (1)
 
@@ -191,7 +191,7 @@ Receivers will announce the availability of buffers as **credits** to the sender
 </center>
 <br>
 
-Credit-based flow control will use [buffers-per-channel](https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel) to specify how many buffers are exclusive (mandatory) and [floating-buffers-per-gate](https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate) for the local buffer pool (optional<sup>3</sup>) thus achieving the same buffer limit as withou [...]
+Credit-based flow control will use [buffers-per-channel]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel) to specify how many buffers are exclusive (mandatory) and [floating-buffers-per-gate]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate) for the local buffer pool (optional<sup>3</sup>) thus achieving the same buffer limit as without flow control. The defaul [...]
 <br>
 
 <sup>3</sup>If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).
@@ -204,11 +204,11 @@ As opposed to the receiver's backpressure mechanisms without flow control, credi
 
 <img align="right" src="{{ site.baseurl }}/img/blog/2019-06-05-network-stack/flink-network-stack5.png" width="300" height="200" alt="Physical-transport-credit-flow-checkpoints-Flink's Network Stack"/>
 
-Since, with flow control, a channel in a multiplex cannot block another of its logical channels, the overall resource utilisation should increase. In addition, by having full control over how much data is “on the wire”, we are also able to improve [checkpoint alignments](https://ci.apache.org/projects/flink/flink-docs-release-1.8/internals/stream_checkpointing.html#checkpointing): without flow control, it would take a while for the channel to fill the network stack’s internal buffers and [...]
+Since, with flow control, a channel in a multiplex cannot block another of its logical channels, the overall resource utilisation should increase. In addition, by having full control over how much data is “on the wire”, we are also able to improve [checkpoint alignments]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/internals/stream_checkpointing.html#checkpointing): without flow control, it would take a while for the channel to fill the network stack’s internal buffers and propagate th [...]
 
-However, the additional announce messages from the receiver may come at some additional costs, especially in setup using SSL-encrypted channels. Also, a single input channel cannot make use of all buffers in the buffer pool because exclusive buffers are not shared. It can also not start right away with sending as much data as is available so that during ramp-up (if you are producing data faster than announcing credits in return) it may take longer to send data through. While this may aff [...]
+However, the additional announce messages from the receiver may come at some additional costs, especially in setup using SSL-encrypted channels. Also, a single input channel cannot make use of all buffers in the buffer pool because exclusive buffers are not shared. It can also not start right away with sending as much data as is available so that during ramp-up (if you are producing data faster than announcing credits in return) it may take longer to send data through. While this may aff [...]
 
-There is one more thing you may notice when using credit-based flow control: since we buffer less data between the sender and receiver, you may experience backpressure earlier. This is, however, desired and you do not really get any advantage by buffering more data. If you want to buffer more but keep flow control, you could consider increasing the number of floating buffers via [floating-buffers-per-gate](https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskma [...]
+There is one more thing you may notice when using credit-based flow control: since we buffer less data between the sender and receiver, you may experience backpressure earlier. This is, however, desired and you do not really get any advantage by buffering more data. If you want to buffer more but keep flow control, you could consider increasing the number of floating buffers via [floating-buffers-per-gate]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#taskmanager-network [...]
 
 <br>
 
@@ -252,9 +252,9 @@ The following picture extends the slightly more high-level view from above with
 </center>
 <br>
 
-After creating a record and passing it along, for example via `Collector#collect()`, it is given to the [RecordWriter](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.html) which serialises the record from a Java object into a sequence of bytes which eventually ends up in a network buffer that is handed along as described above. The RecordWriter first serialises the record to a flexible on-heap byte array us [...]
+After creating a record and passing it along, for example via `Collector#collect()`, it is given to the [RecordWriter]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.html) which serialises the record from a Java object into a sequence of bytes which eventually ends up in a network buffer that is handed along as described above. The RecordWriter first serialises the record to a flexible on-heap byte array using the [Span [...]
 
-On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the [RecordReader](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html) and going through the [SpillingAdaptiveSpanningRecordDeserializer](https [...]
+On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the [RecordReader]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html) and going through the [SpillingAdaptiveSpanningRecordDeserializer]({{ site.DOCS_BASE_ [...]
 <br>
 
 ## Flushing Buffers to Netty
@@ -283,7 +283,7 @@ The RecordWriter works with a local serialisation buffer for the current record
 
 ### Flush after Buffer Timeout
 
-In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via [StreamExecutionEnvironment#setBufferTimeout [...]
+In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via [StreamExecutionEnvironment#setBufferTimeout [...]
 <br>
 
 <center>
@@ -312,7 +312,7 @@ However, you may notice an increased CPU use and TCP packet rate during low load
 
 ## Buffer Builder & Buffer Consumer
 
-If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the [BufferBuilder](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html) and [BufferConsumer](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html) classes which have been introduced in Flink 1.5. While reading i [...]
+If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the [BufferBuilder]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html) and [BufferConsumer]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html) classes which have been introduced in Flink 1.5. While reading is potentially only *per bu [...]
 
 <br>
 
diff --git a/_posts/2019-06-26-broadcast-state.md b/_posts/2019-06-26-broadcast-state.md
index 7c34421..5616d11 100644
--- a/_posts/2019-06-26-broadcast-state.md
+++ b/_posts/2019-06-26-broadcast-state.md
@@ -208,4 +208,4 @@ The `KeyedBroadcastProcessFunction` has full access to Flink state and time feat
 
 In this blog post, we walked you through an example application to explain what Apache Flink’s broadcast state is and how it can be used to evaluate dynamic patterns on event streams. We’ve also discussed the API and showed the source code of our example application. 
 
-We invite you to check the [documentation](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html) of this feature and provide feedback or suggestions for further improvements through our [mailing list](http://mail-archives.apache.org/mod_mbox/flink-community/).
+We invite you to check the [documentation]({{ site.DOCS_BASE_URL }}flink-docs-stable/dev/stream/state/broadcast_state.html) of this feature and provide feedback or suggestions for further improvements through our [mailing list](http://mail-archives.apache.org/mod_mbox/flink-community/).
diff --git a/_posts/2019-07-02-release-1.8.1.md b/_posts/2019-07-02-release-1.8.1.md
index 2d70fef..acd02ce 100644
--- a/_posts/2019-07-02-release-1.8.1.md
+++ b/_posts/2019-07-02-release-1.8.1.md
@@ -35,7 +35,7 @@ Updated Maven dependencies:
 </dependency>
 ```
 
-You can find the binaries on the updated [Downloads page](http://flink.apache.org/downloads.html).
+You can find the binaries on the updated [Downloads page]({{ site.baseurl }}/downloads.html).
 
 List of resolved issues:
     
diff --git a/content/2019/05/03/pulsar-flink.html b/content/2019/05/03/pulsar-flink.html
index b42b22e..98a9eab 100644
--- a/content/2019/05/03/pulsar-flink.html
+++ b/content/2019/05/03/pulsar-flink.html
@@ -192,7 +192,7 @@
 
 <h2 id="pulsars-view-on-data-segmented-data-streams">Pulsar’s view on data: Segmented data streams</h2>
 
-<p>Apache Flink is a streaming-first computation framework that perceives <a href="https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html">batch processing as a special case of streaming</a>. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.</p>
+<p>Apache Flink is a streaming-first computation framework that perceives <a href="/news/2019/02/13/unified-batch-streaming-blink.html">batch processing as a special case of streaming</a>. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.</p>
 
 <p>Apache Pulsar has a similar perspective to that of Apache Flink with regards to the data layer. The framework also uses streams as a unified view on all data, while its layered architecture allows traditional pub-sub messaging for streaming workloads and continuous data processing or usage of <em>Segmented Streams</em> and bounded data stream for batch and static workloads.</p>
 
@@ -298,7 +298,7 @@
 
 <h2 id="conclusion">Conclusion</h2>
 
-<p>Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be <em>“streaming-first”</em> with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the <a href="https://flink.apache.org/community.ht [...]
+<p>Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be <em>“streaming-first”</em> with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the <a href="/community.html#mailing-lists">Apache [...]
 
       </article>
     </div>
diff --git a/content/2019/05/14/temporal-tables.html b/content/2019/05/14/temporal-tables.html
index ae56dc7..009966f 100644
--- a/content/2019/05/14/temporal-tables.html
+++ b/content/2019/05/14/temporal-tables.html
@@ -264,7 +264,7 @@
 
 <p>If you’d like to get some <strong>hands-on practice in joining streams with Flink SQL</strong> (and Flink SQL in general), checkout this <a href="https://github.com/ververica/sql-training/wiki">free training for Flink SQL</a>. The training environment is based on Docker and set up in just a few minutes.</p>
 
-<p>Subscribe to the <a href="https://flink.apache.org/community.html#mailing-lists">Apache Flink mailing lists</a> to stay up-to-date with the latest developments in this space.</p>
+<p>Subscribe to the <a href="/community.html#mailing-lists">Apache Flink mailing lists</a> to stay up-to-date with the latest developments in this space.</p>
 
       </article>
     </div>
diff --git a/content/2019/06/05/flink-network-stack.html b/content/2019/06/05/flink-network-stack.html
index 2c67a21..2445ec5 100644
--- a/content/2019/06/05/flink-network-stack.html
+++ b/content/2019/06/05/flink-network-stack.html
@@ -268,7 +268,7 @@
 <p><br /></p>
 
 <p><sup>1</sup> Currently not used by Flink. <br />
-<sup>2</sup> This may become applicable to streaming jobs once the <a href="https://flink.apache.org/roadmap.html#batch-and-streaming-unification">Batch/Streaming unification</a> is done.</p>
+<sup>2</sup> This may become applicable to streaming jobs once the <a href="/roadmap.html#batch-and-streaming-unification">Batch/Streaming unification</a> is done.</p>
 
 <p><br />
 Additionally, for subtasks with more than one input, scheduling start in two ways: after <em>all</em> or after <em>any</em> input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionMode-">ExecutionConfig#setExecu [...]
diff --git a/content/blog/feed.xml b/content/blog/feed.xml
index 1c51569..d4d033b 100644
--- a/content/blog/feed.xml
+++ b/content/blog/feed.xml
@@ -32,7 +32,7 @@
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
 &lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
 
-&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
 
 &lt;p&gt;List of resolved issues:&lt;/p&gt;
 
@@ -454,7 +454,7 @@ The website implements a streaming application that detects a pattern on the str
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
 &lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; Currently not used by Flink. &lt;br /&gt;
-&lt;sup&gt;2&lt;/sup&gt; This may become applicable to streaming jobs once the &lt;a href=&quot;https://flink.apache.org/roadmap.html#batch-and-streaming-unification&quot;&gt;Batch/Streaming unification&lt;/a&gt; is done.&lt;/p&gt;
+&lt;sup&gt;2&lt;/sup&gt; This may become applicable to streaming jobs once the &lt;a href=&quot;/roadmap.html#batch-and-streaming-unification&quot;&gt;Batch/Streaming unification&lt;/a&gt; is done.&lt;/p&gt;
 
 &lt;p&gt;&lt;br /&gt;
 Additionally, for subtasks with more than one input, scheduling start in two ways: after &lt;em&gt;all&lt;/em&gt; or after &lt;em&gt;any&lt;/em&gt; input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.Executio [...]
@@ -939,7 +939,7 @@ With Flink 1.8.0, users can only define a state TTL in terms of processing time.
 
 &lt;p&gt;If you’d like to get some &lt;strong&gt;hands-on practice in joining streams with Flink SQL&lt;/strong&gt; (and Flink SQL in general), checkout this &lt;a href=&quot;https://github.com/ververica/sql-training/wiki&quot;&gt;free training for Flink SQL&lt;/a&gt;. The training environment is based on Docker and set up in just a few minutes.&lt;/p&gt;
 
-&lt;p&gt;Subscribe to the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Apache Flink mailing lists&lt;/a&gt; to stay up-to-date with the latest developments in this space.&lt;/p&gt;
+&lt;p&gt;Subscribe to the &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;Apache Flink mailing lists&lt;/a&gt; to stay up-to-date with the latest developments in this space.&lt;/p&gt;
 </description>
 <pubDate>Tue, 14 May 2019 14:00:00 +0200</pubDate>
 <link>https://flink.apache.org/2019/05/14/temporal-tables.html</link>
@@ -979,7 +979,7 @@ With Flink 1.8.0, users can only define a state TTL in terms of processing time.
 
 &lt;h2 id=&quot;pulsars-view-on-data-segmented-data-streams&quot;&gt;Pulsar’s view on data: Segmented data streams&lt;/h2&gt;
 
-&lt;p&gt;Apache Flink is a streaming-first computation framework that perceives &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;batch processing as a special case of streaming&lt;/a&gt;. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.&lt;/p&gt;
+&lt;p&gt;Apache Flink is a streaming-first computation framework that perceives &lt;a href=&quot;/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;batch processing as a special case of streaming&lt;/a&gt;. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.&lt;/p&gt;
 
 &lt;p&gt;Apache Pulsar has a similar perspective to that of Apache Flink with regards to the data layer. The framework also uses streams as a unified view on all data, while its layered architecture allows traditional pub-sub messaging for streaming workloads and continuous data processing or usage of &lt;em&gt;Segmented Streams&lt;/em&gt; and bounded data stream for batch and static workloads.&lt;/p&gt;
 
@@ -1085,7 +1085,7 @@ With Flink 1.8.0, users can only define a state TTL in terms of processing time.
 
 &lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
 
-&lt;p&gt;Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be &lt;em&gt;“streaming-first”&lt;/em&gt; with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the &lt;a href=&quot;https://fli [...]
+&lt;p&gt;Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be &lt;em&gt;“streaming-first”&lt;/em&gt; with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the &lt;a href=&quot;/community. [...]
 </description>
 <pubDate>Fri, 03 May 2019 14:00:00 +0200</pubDate>
 <link>https://flink.apache.org/2019/05/03/pulsar-flink.html</link>
@@ -1163,15 +1163,15 @@ for more details.&lt;/p&gt;
 
 &lt;p&gt;Flink 1.8.0 is API-compatible with previous 1.x.y releases for APIs annotated
 with the &lt;code&gt;@Public&lt;/code&gt; annotation.  The release is available now and we encourage
-everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and
+everyone to &lt;a href=&quot;/downloads.html&quot;&gt;download the release&lt;/a&gt; and
 check out the updated
 &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&quot;&gt;documentation&lt;/a&gt;.
-Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing
+Feedback through the Flink &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;mailing
 lists&lt;/a&gt; or
 &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always,
 very much appreciated!&lt;/p&gt;
 
-&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;
+&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;
 
 &lt;div class=&quot;page-toc&quot;&gt;
 &lt;ul id=&quot;markdown-toc&quot;&gt;
@@ -1360,7 +1360,7 @@ Convenience binaries that include hadoop are no longer released.&lt;/p&gt;
 
     &lt;p&gt;If a deployment relies on &lt;code&gt;flink-shaded-hadoop2&lt;/code&gt; being included in
 &lt;code&gt;flink-dist&lt;/code&gt;, then you must manually download a pre-packaged Hadoop
-jar from the optional components section of the &lt;a href=&quot;https://flink.apache.org/downloads.html&quot;&gt;download
+jar from the optional components section of the &lt;a href=&quot;/downloads.html&quot;&gt;download
 page&lt;/a&gt; and copy it into the
 &lt;code&gt;/lib&lt;/code&gt; directory.  Alternatively, a Flink distribution that includes
 hadoop can be built by packaging &lt;code&gt;flink-dist&lt;/code&gt; and activating the
@@ -2235,7 +2235,7 @@ for a full reference of Flink’s metrics system.&lt;/p&gt;
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
 &lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
 
-&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
 
 &lt;p&gt;List of resolved issues:&lt;/p&gt;
 
@@ -2334,7 +2334,7 @@ We highly recommend all users to upgrade to Flink 1.7.2.&lt;/p&gt;
   &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
 &lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
 
-&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
+&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
 
 &lt;p&gt;List of resolved issues:&lt;/p&gt;
 
diff --git a/content/news/2019/02/15/release-1.7.2.html b/content/news/2019/02/15/release-1.7.2.html
index e079b6d..5e47199 100644
--- a/content/news/2019/02/15/release-1.7.2.html
+++ b/content/news/2019/02/15/release-1.7.2.html
@@ -188,7 +188,7 @@ We highly recommend all users to upgrade to Flink 1.7.2.</p>
   <span class="nt">&lt;version&gt;</span>1.7.2<span class="nt">&lt;/version&gt;</span>
 <span class="nt">&lt;/dependency&gt;</span></code></pre></div>
 
-<p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a>.</p>
+<p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a>.</p>
 
 <p>List of resolved issues:</p>
 
diff --git a/content/news/2019/02/25/release-1.6.4.html b/content/news/2019/02/25/release-1.6.4.html
index 07aff42..8c54e82 100644
--- a/content/news/2019/02/25/release-1.6.4.html
+++ b/content/news/2019/02/25/release-1.6.4.html
@@ -186,7 +186,7 @@
   <span class="nt">&lt;version&gt;</span>1.6.4<span class="nt">&lt;/version&gt;</span>
 <span class="nt">&lt;/dependency&gt;</span></code></pre></div>
 
-<p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a>.</p>
+<p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a>.</p>
 
 <p>List of resolved issues:</p>
 
diff --git a/content/news/2019/04/09/release-1.8.0.html b/content/news/2019/04/09/release-1.8.0.html
index 619808a..de67886 100644
--- a/content/news/2019/04/09/release-1.8.0.html
+++ b/content/news/2019/04/09/release-1.8.0.html
@@ -170,15 +170,15 @@ for more details.</p>
 
 <p>Flink 1.8.0 is API-compatible with previous 1.x.y releases for APIs annotated
 with the <code>@Public</code> annotation.  The release is available now and we encourage
-everyone to <a href="http://flink.apache.org/downloads.html">download the release</a> and
+everyone to <a href="/downloads.html">download the release</a> and
 check out the updated
 <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/">documentation</a>.
-Feedback through the Flink <a href="http://flink.apache.org/community.html#mailing-lists">mailing
+Feedback through the Flink <a href="/community.html#mailing-lists">mailing
 lists</a> or
 <a href="https://issues.apache.org/jira/projects/FLINK/summary">JIRA</a> is, as always,
 very much appreciated!</p>
 
-<p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a> on the Flink project site.</p>
+<p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a> on the Flink project site.</p>
 
 <div class="page-toc">
 <ul id="markdown-toc">
@@ -367,7 +367,7 @@ Convenience binaries that include hadoop are no longer released.</p>
 
     <p>If a deployment relies on <code>flink-shaded-hadoop2</code> being included in
 <code>flink-dist</code>, then you must manually download a pre-packaged Hadoop
-jar from the optional components section of the <a href="https://flink.apache.org/downloads.html">download
+jar from the optional components section of the <a href="/downloads.html">download
 page</a> and copy it into the
 <code>/lib</code> directory.  Alternatively, a Flink distribution that includes
 hadoop can be built by packaging <code>flink-dist</code> and activating the
diff --git a/content/news/2019/07/02/release-1.8.1.html b/content/news/2019/07/02/release-1.8.1.html
index 328d4a7..03d8701 100644
--- a/content/news/2019/07/02/release-1.8.1.html
+++ b/content/news/2019/07/02/release-1.8.1.html
@@ -186,7 +186,7 @@
   <span class="nt">&lt;version&gt;</span>1.8.1<span class="nt">&lt;/version&gt;</span>
 <span class="nt">&lt;/dependency&gt;</span></code></pre></div>
 
-<p>You can find the binaries on the updated <a href="http://flink.apache.org/downloads.html">Downloads page</a>.</p>
+<p>You can find the binaries on the updated <a href="/downloads.html">Downloads page</a>.</p>
 
 <p>List of resolved issues:</p>

[flink-web] 02/05: [hotfix] remove incremental build option

Posted by nk...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

nkruber pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit 0645f5444c6396137a9beb8a0ce7f99d9702acdb
Author: Nico Kruber <ni...@ververica.com>
AuthorDate: Wed Jul 17 18:07:50 2019 +0200

    [hotfix] remove incremental build option
    
    This doesn't even work if added to getopts - our (very old) Jekyll is
    complaining.
---
 build.sh | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/build.sh b/build.sh
index e3f25ad..d824d43 100755
--- a/build.sh
+++ b/build.sh
@@ -55,10 +55,6 @@ while getopts ":p" opt; do
     case $opt in
         p)
         JEKYLL_CMD="serve --baseurl= --watch --trace --incremental"
-        ;;
-        i)
-        [[ `${RUBY} -v` =~ 'ruby 1' ]] && echo "Error: building the docs with the incremental option requires at least ruby 2.0" && exit 1
-        JEKYLL_CMD="liveserve --baseurl= --watch --incremental"
     esac
 done

[flink-web] 03/05: [Blog] style-tuning for Network Stack Vol. 1

Posted by nk...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

nkruber pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit 9dba6e8756560ed0ad00c482bc06932a70e63a17
Author: Nico Kruber <ni...@ververica.com>
AuthorDate: Wed Jul 17 15:00:35 2019 +0200

    [Blog] style-tuning for Network Stack Vol. 1
---
 _posts/2019-06-05-flink-network-stack.md    | 151 ++++++++++++------------
 content/2019/06/05/flink-network-stack.html | 172 +++++++++++++++------------
 content/blog/feed.xml                       | 174 +++++++++++++++-------------
 3 files changed, 266 insertions(+), 231 deletions(-)

diff --git a/_posts/2019-06-05-flink-network-stack.md b/_posts/2019-06-05-flink-network-stack.md
index 6e26fbd..f7c36d9 100644
--- a/_posts/2019-06-05-flink-network-stack.md
+++ b/_posts/2019-06-05-flink-network-stack.md
@@ -5,16 +5,27 @@ date: 2019-06-05T08:45:00.000Z
 authors:
 - Nico:
   name: "Nico Kruber"
-  
+
 
 excerpt: Flink’s network stack is one of the core components that make up Apache Flink's runtime module sitting at the core of every Flink job. In this post, which is the first in a series of posts about the network stack, we look at the abstractions exposed to the stream operators and detail their physical implementation and various optimisations in Apache Flink.
 ---
 
+<style type="text/css">
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
+.tg th{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
+.tg .tg-wide{padding:10px 30px;}
+.tg .tg-top{vertical-align:top}
+.tg .tg-center{text-align:center;vertical-align:center}
+</style>
+
 Flink’s network stack is one of the core components that make up the `flink-runtime` module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are using RPCs via Akka, the network [...]
 
 This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning parameters [...]
 
-# Logical View
+{% toc %}
+
+## Logical View
 
 Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a `keyBy()`.
 
@@ -54,42 +65,34 @@ Batch jobs may also produce results in a blocking fashion, depending on the oper
 The following table summarises the valid combinations:
 <br>
 <center>
-<style type="text/css">
-.tg  {border-collapse:collapse;border-spacing:0;}
-.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-wwp9{font-size:15px;background-color:#9b9b9b;border-color:#343434;text-align:left}
-.tg .tg-sogj{font-size:15px;text-align:left}
-.tg .tg-cbs6{font-size:15px;text-align:left;vertical-align:top}
-</style>
 <table class="tg">
   <tr>
-    <th class="tg-wwp9">Output Type</th>
-    <th class="tg-wwp9">Scheduling Type</th>
-    <th class="tg-wwp9">Applies to…</th>
+    <th>Output Type</th>
+    <th>Scheduling Type</th>
+    <th>Applies to…</th>
   </tr>
   <tr>
-    <td class="tg-sogj" rowspan="2">pipelined, unbounded</td>
-    <td class="tg-sogj">all at once</td>
-    <td class="tg-sogj">Streaming jobs</td>
+    <td rowspan="2">pipelined, unbounded</td>
+    <td>all at once</td>
+    <td>Streaming jobs</td>
   </tr>
   <tr>
-    <td class="tg-sogj">next stage on first output</td>
-    <td class="tg-sogj">n/a¹</td>
+    <td>next stage on first output</td>
+    <td>n/a¹</td>
   </tr>
   <tr>
-    <td class="tg-sogj" rowspan="2">pipelined, bounded</td>
-    <td class="tg-sogj">all at once</td>
-    <td class="tg-sogj">n/a²</td>
+    <td rowspan="2">pipelined, bounded</td>
+    <td>all at once</td>
+    <td>n/a²</td>
   </tr>
   <tr>
-    <td class="tg-sogj">next stage on first output</td>
-    <td class="tg-sogj">Batch jobs</td>
+    <td>next stage on first output</td>
+    <td>Batch jobs</td>
   </tr>
   <tr>
-    <td class="tg-cbs6">blocking</td>
-    <td class="tg-cbs6">next stage on complete output</td>
-    <td class="tg-cbs6">Batch jobs</td>
+    <td>blocking</td>
+    <td>next stage on complete output</td>
+    <td>Batch jobs</td>
   </tr>
 </table>
 </center>
@@ -105,7 +108,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 <br>
 
-# Physical Transport
+## Physical Transport
 
 In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via [slot sharing groups]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups). TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.
 
@@ -113,37 +116,29 @@ For the example pictured below, we will assume a parallelism of 4 and a deployme
 <br>
 
 <center>
-<style type="text/css">
-.tg  {border-collapse:collapse;border-spacing:10;}
-.tg td{font-family:Arial, sans-serif;font-size:15px;padding:10px 80px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:15px;font-weight:normal;padding:10px 80px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-266k{background-color:#9b9b9b;border-color:inherit;text-align:left;vertical-align:center}
-.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:center}
-.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:center}
-</style>
 <table class="tg">
   <tr>
-    <th class="tg-266k"></th>
-    <th class="tg-266k">B.1</th>
-    <th class="tg-266k">B.2</th>
-    <th class="tg-266k">B.3</th>
-    <th class="tg-266k">B.4</th>
+    <th></th>
+    <th class="tg-wide">B.1</th>
+    <th class="tg-wide">B.2</th>
+    <th class="tg-wide">B.3</th>
+    <th class="tg-wide">B.4</th>
   </tr>
   <tr>
-    <td class="tg-0pky">A.1</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">local</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">remote</td>
+    <th class="tg-wide">A.1</th>
+    <td class="tg-center" colspan="2" rowspan="2">local</td>
+    <td class="tg-center" colspan="2" rowspan="2">remote</td>
   </tr>
   <tr>
-    <td class="tg-0pky">A.2</td>
+    <th class="tg-wide">A.2</th>
   </tr>
   <tr>
-    <td class="tg-0pky">A.3</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">remote</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">local</td>
+    <th class="tg-wide">A.3</th>
+    <td class="tg-center" colspan="2" rowspan="2">remote</td>
+    <td class="tg-center" colspan="2" rowspan="2">local</td>
   </tr>
   <tr>
-    <td class="tg-0pky">A.4</td>
+    <th class="tg-wide">A.4</th>
   </tr>
 </table>
 </center>
@@ -164,7 +159,7 @@ The results of each subtask are called [ResultPartition]({{ site.DOCS_BASE_URL }
 
 The total number of buffers on a single TaskManager usually does not need configuration. See the [Configuring the Network Buffers]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers) documentation for details on how to do so if needed.
 
-## Inflicting Backpressure (1)
+### Inflicting Backpressure (1)
 
 Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition's buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask's buffer pool, Flink will stop re [...]
 
@@ -179,7 +174,7 @@ To prevent this situation from even happening, Flink 1.5 introduced its own flow
 
 <br>
 
-# Credit-based Flow Control
+## Credit-based Flow Control
 
 Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of **exclusive buffers**. Conversely, buffers in the local buffer pool are called **floating buffers** as they will float around and are available to every input channel.
 
@@ -196,11 +191,11 @@ Credit-based flow control will use [buffers-per-channel]({{ site.DOCS_BASE_URL }
 
 <sup>3</sup>If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).
 
-## Inflicting Backpressure (2)
+### Inflicting Backpressure (2)
 
 As opposed to the receiver's backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.
 
-## What do we Gain? Where is the Catch?
+### What do we Gain? Where is the Catch?
 
 <img align="right" src="{{ site.baseurl }}/img/blog/2019-06-05-network-stack/flink-network-stack5.png" width="300" height="200" alt="Physical-transport-credit-flow-checkpoints-Flink's Network Stack"/>
 
@@ -213,36 +208,40 @@ There is one more thing you may notice when using credit-based flow control: sin
 <br>
 
 <center>
-<style type="text/css">
-.tg  {border-collapse:collapse;border-spacing:0;}
-.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-0vnf{font-size:15px;text-align:center}
-.tg .tg-rc1r{font-size:15px;background-color:#9b9b9b;text-align:left}
-.tg .tg-sogj{font-size:15px;text-align:left}
-</style>
 <table class="tg">
   <tr>
-    <th class="tg-rc1r">Advantages</th>
-    <th class="tg-rc1r">Disadvantages</th>
+    <th>Advantages</th>
+    <th>Disadvantages</th>
   </tr>
   <tr>
-    <td class="tg-sogj">• better resource utilisation with data skew in multiplexed connections <br><br>• improved checkpoint alignment<br><br>• reduced memory use (less data in lower network layers)</td>
-    <td class="tg-sogj">• additional credit-announce messages<br><br>• additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)<br><br>• potential round-trip latency</td>
+    <td class="tg-top">
+    • better resource utilisation with data skew in multiplexed connections <br><br>
+    • improved checkpoint alignment<br><br>
+    • reduced memory use (less data in lower network layers)</td>
+    <td class="tg-top">
+    • additional credit-announce messages<br><br>
+    • additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)<br><br>
+    • potential round-trip latency</td>
   </tr>
   <tr>
-    <td class="tg-0vnf" colspan="2">• backpressure appears earlier</td>
+    <td class="tg-center" colspan="2">• backpressure appears earlier</td>
   </tr>
 </table>
 </center>
 <br>
 
-> _NOTE:_ If you need to turn off credit-based flow control, you can add this to your `flink-conf.yaml`: `taskmanager.network.credit-model: false`. 
-> This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.
+<div class="alert alert-info" markdown="1">
+<span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+If you need to turn off credit-based flow control, you can add this to your `flink-conf.yaml`:
+
+`taskmanager.network.credit-model: false`
+
+This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.
+</div>
 
 <br>
 
-# Writing Records into Network Buffers and Reading them again
+## Writing Records into Network Buffers and Reading them again
 
 The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it:
 <br>
@@ -257,7 +256,7 @@ After creating a record and passing it along, for example via `Collector#collect
 On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the [RecordReader]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html) and going through the [SpillingAdaptiveSpanningRecordDeserializer]({{ site.DOCS_BASE_ [...]
 <br>
 
-## Flushing Buffers to Netty
+### Flushing Buffers to Netty
 
 In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication and synch [...]
 
@@ -268,7 +267,7 @@ In Flink, there are three situations that make a buffer available for consumptio
 * a special event such as a checkpoint barrier is sent.<br>
 <br>
 
-### Flush after Buffer Full
+#### Flush after Buffer Full
 
 The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described above. Thi [...]
 <br>
@@ -281,7 +280,7 @@ The RecordWriter works with a local serialisation buffer for the current record
 <sup>4</sup>We can assume it already got the notification if there are more finished buffers in the queue.
 <br>
 
-### Flush after Buffer Timeout
+#### Flush after Buffer Timeout
 
 In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via [StreamExecutionEnvironment#setBufferTimeout [...]
 <br>
@@ -294,12 +293,12 @@ In order to support low-latency use cases, we cannot only rely on buffers being
 <sup>5</sup>Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured.
 <br>
 
-### Flush after special event
+#### Flush after special event
 
 Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in.
 <br>
 
-### Further remarks
+#### Further remarks
 
 In contrast to Flink < 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:
 
@@ -310,13 +309,13 @@ In contrast to Flink < 1.5, please note that (a) network buffers are now placed
 However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any *available* CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead.
 <br>
 
-## Buffer Builder & Buffer Consumer
+### Buffer Builder & Buffer Consumer
 
 If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the [BufferBuilder]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html) and [BufferConsumer]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html) classes which have been introduced in Flink 1.5. While reading is potentially only *per bu [...]
 
 <br>
 
-# Latency vs. Throughput
+## Latency vs. Throughput
 
 Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record) to 100m [...]
 <br>
@@ -330,7 +329,7 @@ As you can see, with Flink 1.5+, even very low buffer timeouts such as 1ms (for
 
 <br>
 
-# Conclusion
+## Conclusion
 
 Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and common antip [...]
 
diff --git a/content/2019/06/05/flink-network-stack.html b/content/2019/06/05/flink-network-stack.html
index 2445ec5..c1e0fcd 100644
--- a/content/2019/06/05/flink-network-stack.html
+++ b/content/2019/06/05/flink-network-stack.html
@@ -173,11 +173,43 @@
       <article>
         <p>05 Jun 2019 Nico Kruber </p>
 
+<style type="text/css">
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
+.tg th{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
+.tg .tg-wide{padding:10px 30px;}
+.tg .tg-top{vertical-align:top}
+.tg .tg-center{text-align:center;vertical-align:center}
+</style>
+
 <p>Flink’s network stack is one of the core components that make up the <code>flink-runtime</code> module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are using RPCs via Akk [...]
 
 <p>This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning paramet [...]
 
-<h1 id="logical-view">Logical View</h1>
+<div class="page-toc">
+<ul id="markdown-toc">
+  <li><a href="#logical-view" id="markdown-toc-logical-view">Logical View</a></li>
+  <li><a href="#physical-transport" id="markdown-toc-physical-transport">Physical Transport</a>    <ul>
+      <li><a href="#inflicting-backpressure-1" id="markdown-toc-inflicting-backpressure-1">Inflicting Backpressure (1)</a></li>
+    </ul>
+  </li>
+  <li><a href="#credit-based-flow-control" id="markdown-toc-credit-based-flow-control">Credit-based Flow Control</a>    <ul>
+      <li><a href="#inflicting-backpressure-2" id="markdown-toc-inflicting-backpressure-2">Inflicting Backpressure (2)</a></li>
+      <li><a href="#what-do-we-gain-where-is-the-catch" id="markdown-toc-what-do-we-gain-where-is-the-catch">What do we Gain? Where is the Catch?</a></li>
+    </ul>
+  </li>
+  <li><a href="#writing-records-into-network-buffers-and-reading-them-again" id="markdown-toc-writing-records-into-network-buffers-and-reading-them-again">Writing Records into Network Buffers and Reading them again</a>    <ul>
+      <li><a href="#flushing-buffers-to-netty" id="markdown-toc-flushing-buffers-to-netty">Flushing Buffers to Netty</a></li>
+      <li><a href="#buffer-builder--buffer-consumer" id="markdown-toc-buffer-builder--buffer-consumer">Buffer Builder &amp; Buffer Consumer</a></li>
+    </ul>
+  </li>
+  <li><a href="#latency-vs-throughput" id="markdown-toc-latency-vs-throughput">Latency vs. Throughput</a></li>
+  <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li>
+</ul>
+
+</div>
+
+<h2 id="logical-view">Logical View</h2>
 
 <p>Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a <code>keyBy()</code>.</p>
 
@@ -226,42 +258,34 @@
 <p>The following table summarises the valid combinations:
 <br /></p>
 <center>
-<style type="text/css">
-.tg  {border-collapse:collapse;border-spacing:0;}
-.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-wwp9{font-size:15px;background-color:#9b9b9b;border-color:#343434;text-align:left}
-.tg .tg-sogj{font-size:15px;text-align:left}
-.tg .tg-cbs6{font-size:15px;text-align:left;vertical-align:top}
-</style>
 <table class="tg">
   <tr>
-    <th class="tg-wwp9">Output Type</th>
-    <th class="tg-wwp9">Scheduling Type</th>
-    <th class="tg-wwp9">Applies to…</th>
+    <th>Output Type</th>
+    <th>Scheduling Type</th>
+    <th>Applies to…</th>
   </tr>
   <tr>
-    <td class="tg-sogj" rowspan="2">pipelined, unbounded</td>
-    <td class="tg-sogj">all at once</td>
-    <td class="tg-sogj">Streaming jobs</td>
+    <td rowspan="2">pipelined, unbounded</td>
+    <td>all at once</td>
+    <td>Streaming jobs</td>
   </tr>
   <tr>
-    <td class="tg-sogj">next stage on first output</td>
-    <td class="tg-sogj">n/a¹</td>
+    <td>next stage on first output</td>
+    <td>n/a¹</td>
   </tr>
   <tr>
-    <td class="tg-sogj" rowspan="2">pipelined, bounded</td>
-    <td class="tg-sogj">all at once</td>
-    <td class="tg-sogj">n/a²</td>
+    <td rowspan="2">pipelined, bounded</td>
+    <td>all at once</td>
+    <td>n/a²</td>
   </tr>
   <tr>
-    <td class="tg-sogj">next stage on first output</td>
-    <td class="tg-sogj">Batch jobs</td>
+    <td>next stage on first output</td>
+    <td>Batch jobs</td>
   </tr>
   <tr>
-    <td class="tg-cbs6">blocking</td>
-    <td class="tg-cbs6">next stage on complete output</td>
-    <td class="tg-cbs6">Batch jobs</td>
+    <td>blocking</td>
+    <td>next stage on complete output</td>
+    <td>Batch jobs</td>
   </tr>
 </table>
 </center>
@@ -275,7 +299,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 <p><br /></p>
 
-<h1 id="physical-transport">Physical Transport</h1>
+<h2 id="physical-transport">Physical Transport</h2>
 
 <p>In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups">slot sharing groups</a>. TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.</p>
 
@@ -283,37 +307,29 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 <br /></p>
 
 <center>
-<style type="text/css">
-.tg  {border-collapse:collapse;border-spacing:10;}
-.tg td{font-family:Arial, sans-serif;font-size:15px;padding:10px 80px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:15px;font-weight:normal;padding:10px 80px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-266k{background-color:#9b9b9b;border-color:inherit;text-align:left;vertical-align:center}
-.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:center}
-.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:center}
-</style>
 <table class="tg">
   <tr>
-    <th class="tg-266k"></th>
-    <th class="tg-266k">B.1</th>
-    <th class="tg-266k">B.2</th>
-    <th class="tg-266k">B.3</th>
-    <th class="tg-266k">B.4</th>
+    <th></th>
+    <th class="tg-wide">B.1</th>
+    <th class="tg-wide">B.2</th>
+    <th class="tg-wide">B.3</th>
+    <th class="tg-wide">B.4</th>
   </tr>
   <tr>
-    <td class="tg-0pky">A.1</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">local</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">remote</td>
+    <th class="tg-wide">A.1</th>
+    <td class="tg-center" colspan="2" rowspan="2">local</td>
+    <td class="tg-center" colspan="2" rowspan="2">remote</td>
   </tr>
   <tr>
-    <td class="tg-0pky">A.2</td>
+    <th class="tg-wide">A.2</th>
   </tr>
   <tr>
-    <td class="tg-0pky">A.3</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">remote</td>
-    <td class="tg-c3ow" colspan="2" rowspan="2">local</td>
+    <th class="tg-wide">A.3</th>
+    <td class="tg-center" colspan="2" rowspan="2">remote</td>
+    <td class="tg-center" colspan="2" rowspan="2">local</td>
   </tr>
   <tr>
-    <td class="tg-0pky">A.4</td>
+    <th class="tg-wide">A.4</th>
   </tr>
 </table>
 </center>
@@ -335,7 +351,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 <p>The total number of buffers on a single TaskManager usually does not need configuration. See the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers">Configuring the Network Buffers</a> documentation for details on how to do so if needed.</p>
 
-<h2 id="inflicting-backpressure-1">Inflicting Backpressure (1)</h2>
+<h3 id="inflicting-backpressure-1">Inflicting Backpressure (1)</h3>
 
 <p>Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition’s buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask’s buffer pool, Flink will stop [...]
 
@@ -350,7 +366,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 <p><br /></p>
 
-<h1 id="credit-based-flow-control">Credit-based Flow Control</h1>
+<h2 id="credit-based-flow-control">Credit-based Flow Control</h2>
 
 <p>Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of <strong>exclusive buffers</strong>. Conversely, buffers in the local buffer pool are called <strong>floating buffers</strong> as they will float around and are avail [...]
 
@@ -367,11 +383,11 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 <p><sup>3</sup>If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).</p>
 
-<h2 id="inflicting-backpressure-2">Inflicting Backpressure (2)</h2>
+<h3 id="inflicting-backpressure-2">Inflicting Backpressure (2)</h3>
 
 <p>As opposed to the receiver’s backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.</p>
 
-<h2 id="what-do-we-gain-where-is-the-catch">What do we Gain? Where is the Catch?</h2>
+<h3 id="what-do-we-gain-where-is-the-catch">What do we Gain? Where is the Catch?</h3>
 
 <p><img align="right" src="/img/blog/2019-06-05-network-stack/flink-network-stack5.png" width="300" height="200" alt="Physical-transport-credit-flow-checkpoints-Flink's Network Stack" /></p>
 
@@ -384,38 +400,40 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 <p><br /></p>
 
 <center>
-<style type="text/css">
-.tg  {border-collapse:collapse;border-spacing:0;}
-.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-0vnf{font-size:15px;text-align:center}
-.tg .tg-rc1r{font-size:15px;background-color:#9b9b9b;text-align:left}
-.tg .tg-sogj{font-size:15px;text-align:left}
-</style>
 <table class="tg">
   <tr>
-    <th class="tg-rc1r">Advantages</th>
-    <th class="tg-rc1r">Disadvantages</th>
+    <th>Advantages</th>
+    <th>Disadvantages</th>
   </tr>
   <tr>
-    <td class="tg-sogj">• better resource utilisation with data skew in multiplexed connections <br /><br />• improved checkpoint alignment<br /><br />• reduced memory use (less data in lower network layers)</td>
-    <td class="tg-sogj">• additional credit-announce messages<br /><br />• additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)<br /><br />• potential round-trip latency</td>
+    <td class="tg-top">
+    • better resource utilisation with data skew in multiplexed connections <br /><br />
+    • improved checkpoint alignment<br /><br />
+    • reduced memory use (less data in lower network layers)</td>
+    <td class="tg-top">
+    • additional credit-announce messages<br /><br />
+    • additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)<br /><br />
+    • potential round-trip latency</td>
   </tr>
   <tr>
-    <td class="tg-0vnf" colspan="2">• backpressure appears earlier</td>
+    <td class="tg-center" colspan="2">• backpressure appears earlier</td>
   </tr>
 </table>
 </center>
 <p><br /></p>
 
-<blockquote>
-  <p><em>NOTE:</em> If you need to turn off credit-based flow control, you can add this to your <code>flink-conf.yaml</code>: <code>taskmanager.network.credit-model: false</code>. 
-This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.</p>
-</blockquote>
+<div class="alert alert-info">
+  <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+If you need to turn off credit-based flow control, you can add this to your <code>flink-conf.yaml</code>:</p>
+
+  <p><code>taskmanager.network.credit-model: false</code></p>
+
+  <p>This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.</p>
+</div>
 
 <p><br /></p>
 
-<h1 id="writing-records-into-network-buffers-and-reading-them-again">Writing Records into Network Buffers and Reading them again</h1>
+<h2 id="writing-records-into-network-buffers-and-reading-them-again">Writing Records into Network Buffers and Reading them again</h2>
 
 <p>The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it:
 <br /></p>
@@ -430,7 +448,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 <p>On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html">RecordReader</a> and going through the <a href="https://ci.apache.org/proje [...]
 <br /></p>
 
-<h2 id="flushing-buffers-to-netty">Flushing Buffers to Netty</h2>
+<h3 id="flushing-buffers-to-netty">Flushing Buffers to Netty</h3>
 
 <p>In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication and sy [...]
 
@@ -443,7 +461,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 <br /></li>
 </ul>
 
-<h3 id="flush-after-buffer-full">Flush after Buffer Full</h3>
+<h4 id="flush-after-buffer-full">Flush after Buffer Full</h4>
 
 <p>The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described above.  [...]
 <br /></p>
@@ -456,7 +474,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 <p><sup>4</sup>We can assume it already got the notification if there are more finished buffers in the queue.
 <br /></p>
 
-<h3 id="flush-after-buffer-timeout">Flush after Buffer Timeout</h3>
+<h4 id="flush-after-buffer-timeout">Flush after Buffer Timeout</h4>
 
 <p>In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via <a href="https://ci.apache.org/projects/f [...]
 <br /></p>
@@ -469,12 +487,12 @@ This parameter, however, is deprecated and will eventually be removed along with
 <p><sup>5</sup>Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured.
 <br /></p>
 
-<h3 id="flush-after-special-event">Flush after special event</h3>
+<h4 id="flush-after-special-event">Flush after special event</h4>
 
 <p>Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in.
 <br /></p>
 
-<h3 id="further-remarks">Further remarks</h3>
+<h4 id="further-remarks">Further remarks</h4>
 
 <p>In contrast to Flink &lt; 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:</p>
 
@@ -487,13 +505,13 @@ This parameter, however, is deprecated and will eventually be removed along with
 <p>However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any <em>available</em> CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead.
 <br /></p>
 
-<h2 id="buffer-builder--buffer-consumer">Buffer Builder &amp; Buffer Consumer</h2>
+<h3 id="buffer-builder--buffer-consumer">Buffer Builder &amp; Buffer Consumer</h3>
 
 <p>If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html">BufferBuilder</a> and <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html">BufferConsumer</a> classes which have been introduced in F [...]
 
 <p><br /></p>
 
-<h1 id="latency-vs-throughput">Latency vs. Throughput</h1>
+<h2 id="latency-vs-throughput">Latency vs. Throughput</h2>
 
 <p>Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record) to 1 [...]
 <br /></p>
@@ -507,7 +525,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 
 <p><br /></p>
 
-<h1 id="conclusion">Conclusion</h1>
+<h2 id="conclusion">Conclusion</h2>
 
 <p>Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and common an [...]
 
diff --git a/content/blog/feed.xml b/content/blog/feed.xml
index d4d033b..ebfe803 100644
--- a/content/blog/feed.xml
+++ b/content/blog/feed.xml
@@ -359,11 +359,43 @@ The website implements a streaming application that detects a pattern on the str
 
 <item>
 <title>A Deep-Dive into Flink&#39;s Network Stack</title>
-<description>&lt;p&gt;Flink’s network stack is one of the core components that make up the &lt;code&gt;flink-runtime&lt;/code&gt; module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManage [...]
+<description>&lt;style type=&quot;text/css&quot;&gt;
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
+.tg th{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
+.tg .tg-wide{padding:10px 30px;}
+.tg .tg-top{vertical-align:top}
+.tg .tg-center{text-align:center;vertical-align:center}
+&lt;/style&gt;
+
+&lt;p&gt;Flink’s network stack is one of the core components that make up the &lt;code&gt;flink-runtime&lt;/code&gt; module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are  [...]
 
 &lt;p&gt;This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning p [...]
 
-&lt;h1 id=&quot;logical-view&quot;&gt;Logical View&lt;/h1&gt;
+&lt;div class=&quot;page-toc&quot;&gt;
+&lt;ul id=&quot;markdown-toc&quot;&gt;
+  &lt;li&gt;&lt;a href=&quot;#logical-view&quot; id=&quot;markdown-toc-logical-view&quot;&gt;Logical View&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#physical-transport&quot; id=&quot;markdown-toc-physical-transport&quot;&gt;Physical Transport&lt;/a&gt;    &lt;ul&gt;
+      &lt;li&gt;&lt;a href=&quot;#inflicting-backpressure-1&quot; id=&quot;markdown-toc-inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/a&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#credit-based-flow-control&quot; id=&quot;markdown-toc-credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/a&gt;    &lt;ul&gt;
+      &lt;li&gt;&lt;a href=&quot;#inflicting-backpressure-2&quot; id=&quot;markdown-toc-inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;#what-do-we-gain-where-is-the-catch&quot; id=&quot;markdown-toc-what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/a&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#writing-records-into-network-buffers-and-reading-them-again&quot; id=&quot;markdown-toc-writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/a&gt;    &lt;ul&gt;
+      &lt;li&gt;&lt;a href=&quot;#flushing-buffers-to-netty&quot; id=&quot;markdown-toc-flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;#buffer-builder--buffer-consumer&quot; id=&quot;markdown-toc-buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/a&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#latency-vs-throughput&quot; id=&quot;markdown-toc-latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;/div&gt;
+
+&lt;h2 id=&quot;logical-view&quot;&gt;Logical View&lt;/h2&gt;
 
 &lt;p&gt;Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a &lt;code&gt;keyBy()&lt;/code&gt;.&lt;/p&gt;
 
@@ -412,42 +444,34 @@ The website implements a streaming application that detects a pattern on the str
 &lt;p&gt;The following table summarises the valid combinations:
 &lt;br /&gt;&lt;/p&gt;
 &lt;center&gt;
-&lt;style type=&quot;text/css&quot;&gt;
-.tg  {border-collapse:collapse;border-spacing:0;}
-.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-wwp9{font-size:15px;background-color:#9b9b9b;border-color:#343434;text-align:left}
-.tg .tg-sogj{font-size:15px;text-align:left}
-.tg .tg-cbs6{font-size:15px;text-align:left;vertical-align:top}
-&lt;/style&gt;
 &lt;table class=&quot;tg&quot;&gt;
   &lt;tr&gt;
-    &lt;th class=&quot;tg-wwp9&quot;&gt;Output Type&lt;/th&gt;
-    &lt;th class=&quot;tg-wwp9&quot;&gt;Scheduling Type&lt;/th&gt;
-    &lt;th class=&quot;tg-wwp9&quot;&gt;Applies to…&lt;/th&gt;
+    &lt;th&gt;Output Type&lt;/th&gt;
+    &lt;th&gt;Scheduling Type&lt;/th&gt;
+    &lt;th&gt;Applies to…&lt;/th&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-sogj&quot; rowspan=&quot;2&quot;&gt;pipelined, unbounded&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;all at once&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;Streaming jobs&lt;/td&gt;
+    &lt;td rowspan=&quot;2&quot;&gt;pipelined, unbounded&lt;/td&gt;
+    &lt;td&gt;all at once&lt;/td&gt;
+    &lt;td&gt;Streaming jobs&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;next stage on first output&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;n/a¹&lt;/td&gt;
+    &lt;td&gt;next stage on first output&lt;/td&gt;
+    &lt;td&gt;n/a¹&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-sogj&quot; rowspan=&quot;2&quot;&gt;pipelined, bounded&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;all at once&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;n/a²&lt;/td&gt;
+    &lt;td rowspan=&quot;2&quot;&gt;pipelined, bounded&lt;/td&gt;
+    &lt;td&gt;all at once&lt;/td&gt;
+    &lt;td&gt;n/a²&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;next stage on first output&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;Batch jobs&lt;/td&gt;
+    &lt;td&gt;next stage on first output&lt;/td&gt;
+    &lt;td&gt;Batch jobs&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-cbs6&quot;&gt;blocking&lt;/td&gt;
-    &lt;td class=&quot;tg-cbs6&quot;&gt;next stage on complete output&lt;/td&gt;
-    &lt;td class=&quot;tg-cbs6&quot;&gt;Batch jobs&lt;/td&gt;
+    &lt;td&gt;blocking&lt;/td&gt;
+    &lt;td&gt;next stage on complete output&lt;/td&gt;
+    &lt;td&gt;Batch jobs&lt;/td&gt;
   &lt;/tr&gt;
 &lt;/table&gt;
 &lt;/center&gt;
@@ -461,7 +485,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;h1 id=&quot;physical-transport&quot;&gt;Physical Transport&lt;/h1&gt;
+&lt;h2 id=&quot;physical-transport&quot;&gt;Physical Transport&lt;/h2&gt;
 
 &lt;p&gt;In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups&quot;&gt;slot sharing groups&lt;/a&gt;. TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.&lt;/p&gt;
 
@@ -469,37 +493,29 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 &lt;br /&gt;&lt;/p&gt;
 
 &lt;center&gt;
-&lt;style type=&quot;text/css&quot;&gt;
-.tg  {border-collapse:collapse;border-spacing:10;}
-.tg td{font-family:Arial, sans-serif;font-size:15px;padding:10px 80px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:15px;font-weight:normal;padding:10px 80px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-266k{background-color:#9b9b9b;border-color:inherit;text-align:left;vertical-align:center}
-.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:center}
-.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:center}
-&lt;/style&gt;
 &lt;table class=&quot;tg&quot;&gt;
   &lt;tr&gt;
-    &lt;th class=&quot;tg-266k&quot;&gt;&lt;/th&gt;
-    &lt;th class=&quot;tg-266k&quot;&gt;B.1&lt;/th&gt;
-    &lt;th class=&quot;tg-266k&quot;&gt;B.2&lt;/th&gt;
-    &lt;th class=&quot;tg-266k&quot;&gt;B.3&lt;/th&gt;
-    &lt;th class=&quot;tg-266k&quot;&gt;B.4&lt;/th&gt;
+    &lt;th&gt;&lt;/th&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;B.1&lt;/th&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;B.2&lt;/th&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;B.3&lt;/th&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;B.4&lt;/th&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-0pky&quot;&gt;A.1&lt;/td&gt;
-    &lt;td class=&quot;tg-c3ow&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
-    &lt;td class=&quot;tg-c3ow&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;A.1&lt;/th&gt;
+    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-0pky&quot;&gt;A.2&lt;/td&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;A.2&lt;/th&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-0pky&quot;&gt;A.3&lt;/td&gt;
-    &lt;td class=&quot;tg-c3ow&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
-    &lt;td class=&quot;tg-c3ow&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;A.3&lt;/th&gt;
+    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-0pky&quot;&gt;A.4&lt;/td&gt;
+    &lt;th class=&quot;tg-wide&quot;&gt;A.4&lt;/th&gt;
   &lt;/tr&gt;
 &lt;/table&gt;
 &lt;/center&gt;
@@ -521,7 +537,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 &lt;p&gt;The total number of buffers on a single TaskManager usually does not need configuration. See the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers&quot;&gt;Configuring the Network Buffers&lt;/a&gt; documentation for details on how to do so if needed.&lt;/p&gt;
 
-&lt;h2 id=&quot;inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/h2&gt;
+&lt;h3 id=&quot;inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/h3&gt;
 
 &lt;p&gt;Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition’s buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask’s buffer pool, Flink wil [...]
 
@@ -536,7 +552,7 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;h1 id=&quot;credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/h1&gt;
+&lt;h2 id=&quot;credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/h2&gt;
 
 &lt;p&gt;Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of &lt;strong&gt;exclusive buffers&lt;/strong&gt;. Conversely, buffers in the local buffer pool are called &lt;strong&gt;floating buffers&lt;/strong&gt; as they w [...]
 
@@ -553,11 +569,11 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 
 &lt;p&gt;&lt;sup&gt;3&lt;/sup&gt;If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).&lt;/p&gt;
 
-&lt;h2 id=&quot;inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/h2&gt;
+&lt;h3 id=&quot;inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/h3&gt;
 
 &lt;p&gt;As opposed to the receiver’s backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.&lt;/p&gt;
 
-&lt;h2 id=&quot;what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/h2&gt;
+&lt;h3 id=&quot;what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/h3&gt;
 
 &lt;p&gt;&lt;img align=&quot;right&quot; src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack5.png&quot; width=&quot;300&quot; height=&quot;200&quot; alt=&quot;Physical-transport-credit-flow-checkpoints-Flink&#39;s Network Stack&quot; /&gt;&lt;/p&gt;
 
@@ -570,38 +586,40 @@ Additionally, for subtasks with more than one input, scheduling start in two way
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
 &lt;center&gt;
-&lt;style type=&quot;text/css&quot;&gt;
-.tg  {border-collapse:collapse;border-spacing:0;}
-.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 30px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
-.tg .tg-0vnf{font-size:15px;text-align:center}
-.tg .tg-rc1r{font-size:15px;background-color:#9b9b9b;text-align:left}
-.tg .tg-sogj{font-size:15px;text-align:left}
-&lt;/style&gt;
 &lt;table class=&quot;tg&quot;&gt;
   &lt;tr&gt;
-    &lt;th class=&quot;tg-rc1r&quot;&gt;Advantages&lt;/th&gt;
-    &lt;th class=&quot;tg-rc1r&quot;&gt;Disadvantages&lt;/th&gt;
+    &lt;th&gt;Advantages&lt;/th&gt;
+    &lt;th&gt;Disadvantages&lt;/th&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;• better resource utilisation with data skew in multiplexed connections &lt;br /&gt;&lt;br /&gt;• improved checkpoint alignment&lt;br /&gt;&lt;br /&gt;• reduced memory use (less data in lower network layers)&lt;/td&gt;
-    &lt;td class=&quot;tg-sogj&quot;&gt;• additional credit-announce messages&lt;br /&gt;&lt;br /&gt;• additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)&lt;br /&gt;&lt;br /&gt;• potential round-trip latency&lt;/td&gt;
+    &lt;td class=&quot;tg-top&quot;&gt;
+    • better resource utilisation with data skew in multiplexed connections &lt;br /&gt;&lt;br /&gt;
+    • improved checkpoint alignment&lt;br /&gt;&lt;br /&gt;
+    • reduced memory use (less data in lower network layers)&lt;/td&gt;
+    &lt;td class=&quot;tg-top&quot;&gt;
+    • additional credit-announce messages&lt;br /&gt;&lt;br /&gt;
+    • additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)&lt;br /&gt;&lt;br /&gt;
+    • potential round-trip latency&lt;/td&gt;
   &lt;/tr&gt;
   &lt;tr&gt;
-    &lt;td class=&quot;tg-0vnf&quot; colspan=&quot;2&quot;&gt;• backpressure appears earlier&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot;&gt;• backpressure appears earlier&lt;/td&gt;
   &lt;/tr&gt;
 &lt;/table&gt;
 &lt;/center&gt;
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;blockquote&gt;
-  &lt;p&gt;&lt;em&gt;NOTE:&lt;/em&gt; If you need to turn off credit-based flow control, you can add this to your &lt;code&gt;flink-conf.yaml&lt;/code&gt;: &lt;code&gt;taskmanager.network.credit-model: false&lt;/code&gt;. 
-This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.&lt;/p&gt;
-&lt;/blockquote&gt;
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
+If you need to turn off credit-based flow control, you can add this to your &lt;code&gt;flink-conf.yaml&lt;/code&gt;:&lt;/p&gt;
+
+  &lt;p&gt;&lt;code&gt;taskmanager.network.credit-model: false&lt;/code&gt;&lt;/p&gt;
+
+  &lt;p&gt;This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.&lt;/p&gt;
+&lt;/div&gt;
 
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;h1 id=&quot;writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/h1&gt;
+&lt;h2 id=&quot;writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/h2&gt;
 
 &lt;p&gt;The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it:
 &lt;br /&gt;&lt;/p&gt;
@@ -616,7 +634,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 &lt;p&gt;On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html&quot;&gt;RecordReader&lt;/a&gt; and going through the &lt;a hr [...]
 &lt;br /&gt;&lt;/p&gt;
 
-&lt;h2 id=&quot;flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/h2&gt;
+&lt;h3 id=&quot;flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/h3&gt;
 
 &lt;p&gt;In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication  [...]
 
@@ -629,7 +647,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 &lt;br /&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;h3 id=&quot;flush-after-buffer-full&quot;&gt;Flush after Buffer Full&lt;/h3&gt;
+&lt;h4 id=&quot;flush-after-buffer-full&quot;&gt;Flush after Buffer Full&lt;/h4&gt;
 
 &lt;p&gt;The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described a [...]
 &lt;br /&gt;&lt;/p&gt;
@@ -642,7 +660,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 &lt;p&gt;&lt;sup&gt;4&lt;/sup&gt;We can assume it already got the notification if there are more finished buffers in the queue.
 &lt;br /&gt;&lt;/p&gt;
 
-&lt;h3 id=&quot;flush-after-buffer-timeout&quot;&gt;Flush after Buffer Timeout&lt;/h3&gt;
+&lt;h4 id=&quot;flush-after-buffer-timeout&quot;&gt;Flush after Buffer Timeout&lt;/h4&gt;
 
 &lt;p&gt;In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via &lt;a href=&quot;https://ci.apache. [...]
 &lt;br /&gt;&lt;/p&gt;
@@ -655,12 +673,12 @@ This parameter, however, is deprecated and will eventually be removed along with
 &lt;p&gt;&lt;sup&gt;5&lt;/sup&gt;Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured.
 &lt;br /&gt;&lt;/p&gt;
 
-&lt;h3 id=&quot;flush-after-special-event&quot;&gt;Flush after special event&lt;/h3&gt;
+&lt;h4 id=&quot;flush-after-special-event&quot;&gt;Flush after special event&lt;/h4&gt;
 
 &lt;p&gt;Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in.
 &lt;br /&gt;&lt;/p&gt;
 
-&lt;h3 id=&quot;further-remarks&quot;&gt;Further remarks&lt;/h3&gt;
+&lt;h4 id=&quot;further-remarks&quot;&gt;Further remarks&lt;/h4&gt;
 
 &lt;p&gt;In contrast to Flink &amp;lt; 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:&lt;/p&gt;
 
@@ -673,13 +691,13 @@ This parameter, however, is deprecated and will eventually be removed along with
 &lt;p&gt;However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any &lt;em&gt;available&lt;/em&gt; CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead.
 &lt;br /&gt;&lt;/p&gt;
 
-&lt;h2 id=&quot;buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/h2&gt;
+&lt;h3 id=&quot;buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/h3&gt;
 
 &lt;p&gt;If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html&quot;&gt;BufferBuilder&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html&quot;&gt;BufferConsumer [...]
 
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;h1 id=&quot;latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/h1&gt;
+&lt;h2 id=&quot;latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/h2&gt;
 
 &lt;p&gt;Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record [...]
 &lt;br /&gt;&lt;/p&gt;
@@ -693,7 +711,7 @@ This parameter, however, is deprecated and will eventually be removed along with
 
 &lt;p&gt;&lt;br /&gt;&lt;/p&gt;
 
-&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
+&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
 
 &lt;p&gt;Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and com [...]

[flink-web] 05/05: Rebuild website

Posted by nk...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

nkruber pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit 606b5df34df5a3cc2ba216368de810a0c37b8247
Author: Nico Kruber <ni...@ververica.com>
AuthorDate: Tue Jul 23 17:47:00 2019 +0200

    Rebuild website
---
 content/2019/07/23/flink-network-stack-2.html      | 579 +++++++++++++++++++++
 content/blog/feed.xml                              | 360 +++++++++++++
 content/blog/index.html                            |  36 +-
 content/blog/page2/index.html                      |  38 +-
 content/blog/page3/index.html                      |  38 +-
 content/blog/page4/index.html                      |  38 +-
 content/blog/page5/index.html                      |  40 +-
 content/blog/page6/index.html                      |  40 +-
 content/blog/page7/index.html                      |  40 +-
 content/blog/page8/index.html                      |  40 +-
 content/blog/page9/index.html                      |  25 +
 content/css/flink.css                              |   5 +
 .../back_pressure_sampling_high.png                | Bin 0 -> 77546 bytes
 content/index.html                                 |   6 +-
 content/roadmap.html                               |   4 +-
 content/zh/community.html                          |   6 +
 content/zh/index.html                              |   6 +-
 17 files changed, 1177 insertions(+), 124 deletions(-)

diff --git a/content/2019/07/23/flink-network-stack-2.html b/content/2019/07/23/flink-network-stack-2.html
new file mode 100644
index 0000000..3601198
--- /dev/null
+++ b/content/2019/07/23/flink-network-stack-2.html
@@ -0,0 +1,579 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
+    <title>Apache Flink: Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</title>
+    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+    <link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+    <!-- Bootstrap -->
+    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
+    <link rel="stylesheet" href="/css/flink.css">
+    <link rel="stylesheet" href="/css/syntax.css">
+
+    <!-- Blog RSS feed -->
+    <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" />
+
+    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
+    <!-- We need to load Jquery in the header for custom google analytics event tracking-->
+    <script src="/js/jquery.min.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
+      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
+    <![endif]-->
+  </head>
+  <body>  
+    
+
+    <!-- Main content. -->
+    <div class="container">
+    <div class="row">
+
+      
+     <div id="sidebar" class="col-sm-3">
+        
+
+<!-- Top navbar. -->
+    <nav class="navbar navbar-default">
+        <!-- The logo. -->
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <div class="navbar-logo">
+            <a href="/">
+              <img alt="Apache Flink" src="/img/flink-header-logo.svg" width="147px" height="73px">
+            </a>
+          </div>
+        </div><!-- /.navbar-header -->
+
+        <!-- The navigation links. -->
+        <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
+          <ul class="nav navbar-nav navbar-main">
+
+            <!-- First menu section explains visitors what Flink is -->
+
+            <!-- What is Stream Processing? -->
+            <!--
+            <li><a href="/streamprocessing1.html">What is Stream Processing?</a></li>
+            -->
+
+            <!-- What is Flink? -->
+            <li><a href="/flink-architecture.html">What is Apache Flink?</a></li>
+
+            
+            <ul class="nav navbar-nav navbar-subnav">
+              <li >
+                  <a href="/flink-architecture.html">Architecture</a>
+              </li>
+              <li >
+                  <a href="/flink-applications.html">Applications</a>
+              </li>
+              <li >
+                  <a href="/flink-operations.html">Operations</a>
+              </li>
+            </ul>
+            
+
+            <!-- Use cases -->
+            <li><a href="/usecases.html">Use Cases</a></li>
+
+            <!-- Powered by -->
+            <li><a href="/poweredby.html">Powered By</a></li>
+
+            <!-- FAQ -->
+            <li><a href="/faq.html">FAQ</a></li>
+
+            &nbsp;
+            <!-- Second menu section aims to support Flink users -->
+
+            <!-- Downloads -->
+            <li><a href="/downloads.html">Downloads</a></li>
+
+            <!-- Quickstart -->
+            <li>
+              <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/quickstart/setup_quickstart.html" target="_blank">Tutorials <small><span class="glyphicon glyphicon-new-window"></span></small></a>
+            </li>
+
+            <!-- Documentation -->
+            <li class="dropdown">
+              <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a>
+              <ul class="dropdown-menu">
+                <li><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8" target="_blank">1.8 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+                <li><a href="https://ci.apache.org/projects/flink/flink-docs-master" target="_blank">1.9 (Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+              </ul>
+            </li>
+
+            <!-- getting help -->
+            <li><a href="/gettinghelp.html">Getting Help</a></li>
+
+            <!-- Blog -->
+            <li><a href="/blog/"><b>Flink Blog</b></a></li>
+
+            &nbsp;
+
+            <!-- Third menu section aim to support community and contributors -->
+
+            <!-- Community -->
+            <li><a href="/community.html">Community &amp; Project Info</a></li>
+
+            <!-- Roadmap -->
+            <li><a href="/roadmap.html">Roadmap</a></li>
+
+            <!-- Contribute -->
+            <li><a href="/contributing/how-to-contribute.html">How to Contribute</a></li>
+            
+
+            <!-- GitHub -->
+            <li>
+              <a href="https://github.com/apache/flink" target="_blank">Flink on GitHub <small><span class="glyphicon glyphicon-new-window"></span></small></a>
+            </li>
+
+            &nbsp;
+
+            <!-- Language Switcher -->
+            <li>
+              
+                
+                  <a href="/zh/2019/07/23/flink-network-stack-2.html">中文版</a>
+                
+              
+            </li>
+
+          </ul>
+
+          <ul class="nav navbar-nav navbar-bottom">
+          <hr />
+
+            <!-- Twitter -->
+            <li><a href="https://twitter.com/apacheflink" target="_blank">@ApacheFlink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+
+            <!-- Visualizer -->
+            <li class=" hidden-md hidden-sm"><a href="/visualizer/" target="_blank">Plan Visualizer <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+
+          </ul>
+        </div><!-- /.navbar-collapse -->
+    </nav>
+
+      </div>
+      <div class="col-sm-9">
+      <div class="row-fluid">
+  <div class="col-sm-12">
+    <div class="row">
+      <h1>Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</h1>
+
+      <article>
+        <p>23 Jul 2019 Nico Kruber  &amp; Piotr Nowojski </p>
+
+<style type="text/css">
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
+.tg th{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
+.tg .tg-wide{padding:10px 30px;}
+.tg .tg-top{vertical-align:top}
+.tg .tg-topcenter{text-align:center;vertical-align:top}
+.tg .tg-center{text-align:center;vertical-align:center}
+</style>
+
+<p>In a <a href="/2019/06/05/flink-network-stack.html">previous blog post</a>, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure, the topic of tuning the netw [...]
+
+<div class="page-toc">
+<ul id="markdown-toc">
+  <li><a href="#monitoring" id="markdown-toc-monitoring">Monitoring</a>    <ul>
+      <li><a href="#backpressure-monitor" id="markdown-toc-backpressure-monitor">Backpressure Monitor</a></li>
+    </ul>
+  </li>
+  <li><a href="#network-metrics" id="markdown-toc-network-metrics">Network Metrics</a>    <ul>
+      <li><a href="#backpressure" id="markdown-toc-backpressure">Backpressure</a></li>
+      <li><a href="#resource-usage--throughput" id="markdown-toc-resource-usage--throughput">Resource Usage / Throughput</a></li>
+      <li><a href="#latency-tracking" id="markdown-toc-latency-tracking">Latency Tracking</a></li>
+    </ul>
+  </li>
+  <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li>
+</ul>
+
+</div>
+
+<h2 id="monitoring">Monitoring</h2>
+
+<p>Probably the most important part of network monitoring is <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html">monitoring backpressure</a>, a situation where a system is receiving data at a higher rate than it can process¹. Such behaviour will result in the sender being backpressured and may be caused by two things:</p>
+
+<ul>
+  <li>
+    <p>The receiver is slow.<br />
+This can happen because the receiver is backpressured itself, is unable to keep processing at the same rate as the sender, or is temporarily blocked by garbage collection, lack of system resources, or I/O.</p>
+  </li>
+  <li>
+    <p>The network channel is slow.<br />
+  Even though in such case the receiver is not (directly) involved, we call the sender backpressured due to a potential oversubscription on network bandwidth shared by all subtasks running on the same machine. Beware that, in addition to Flink’s network stack, there may be more network users, such as sources and sinks, distributed file systems (checkpointing, network-attached storage), logging, and metrics. A previous <a href="https://www.ververica.com/blog/how-to-size-your-apache-flink- [...]
+  </li>
+</ul>
+
+<p><sup>1</sup> In case you are unfamiliar with backpressure and how it interacts with Flink, we recommend reading through <a href="https://www.ververica.com/blog/how-flink-handles-backpressure">this blog post on backpressure</a> from 2015.</p>
+
+<p><br />
+If backpressure occurs, it will bubble upstream and eventually reach your sources and slow them down. This is not a bad thing per-se and merely states that you lack resources for the current load. However, you may want to improve your job so that it can cope with higher loads without using more resources. In order to do so, you need to find (1) where (at which task/operator) the bottleneck is and (2) what is causing it. Flink offers two mechanisms for identifying where the bottleneck is:</p>
+
+<ul>
+  <li>directly via Flink’s web UI and its backpressure monitor, or</li>
+  <li>indirectly through some of the network metrics.</li>
+</ul>
+
+<p>Flink’s web UI is likely the first entry point for a quick troubleshooting but has some disadvantages that we will explain below. On the other hand, Flink’s network metrics are better suited for continuous monitoring and reasoning about the exact nature of the bottleneck causing backpressure. We will cover both in the sections below. In both cases, you need to identify the origin of backpressure from the sources to the sinks. Your starting point for the current and future investigatio [...]
+
+<h3 id="backpressure-monitor">Backpressure Monitor</h3>
+
+<p>The <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html">backpressure monitor</a> is only exposed via Flink’s web UI². Since it’s an active component that is only triggered on request, it is currently not available via metrics. The backpressure monitor samples the running tasks’ threads on all TaskManagers via <code>Thread.getStackTrace()</code> and computes the number of samples where tasks were blocked on a buffer request. These tasks w [...]
+
+<ul>
+  <li><span style="color:green">OK</span> for <code>ratio ≤ 0.10</code>,</li>
+  <li><span style="color:orange">LOW</span> for <code>0.10 &lt; Ratio ≤ 0.5</code>, and</li>
+  <li><span style="color:red">HIGH</span> for <code>0.5 &lt; Ratio ≤ 1</code>.</li>
+</ul>
+
+<p>Although you can tune things like the refresh-interval, the number of samples, or the delay between samples, normally, you would not need to touch these since the defaults already give good-enough results.</p>
+
+<center>
+<img src="/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png" width="600px" alt="Backpressure sampling:high" />
+</center>
+
+<p><sup>2</sup> You may also access the backpressure monitor via the REST API: <code>/jobs/:jobid/vertices/:vertexid/backpressure</code></p>
+
+<p><br />
+The backpressure monitor can help you find where (at which task/operator) backpressure originates from. However, it does not support you in further reasoning about the causes of it. Additionally, for larger jobs or higher parallelism, the backpressure monitor becomes too crowded to use and may also take some time to gather all information from all TaskManagers. Please also note that sampling may affect your running job’s performance.</p>
+
+<h2 id="network-metrics">Network Metrics</h2>
+
+<p><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#network">Network</a> and <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#io">task I/O</a> metrics are more lightweight than the backpressure monitor and are continuously published for each running job. We can leverage those and get even more insights, not only for backpressure monitoring. The most relevant metrics for users are:</p>
+
+<ul>
+  <li>
+    <p><strong><span style="color:orange">up to Flink 1.8:</span></strong> <code>outPoolUsage</code>, <code>inPoolUsage</code><br />
+An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
+While interpreting <code>inPoolUsage</code> in Flink 1.5 - 1.8 with credit-based flow control, please note that this only relates to floating buffers (exclusive buffers are not part of the pool).</p>
+  </li>
+  <li>
+    <p><strong><span style="color:green">Flink 1.9 and above:</span></strong> <code>outPoolUsage</code>, <code>inPoolUsage</code>, <code>floatingBuffersUsage</code>, <code>exclusiveBuffersUsage</code><br />
+An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
+Starting with Flink 1.9, <code>inPoolUsage</code> is the sum of <code>floatingBuffersUsage</code> and <code>exclusiveBuffersUsage</code>.</p>
+  </li>
+  <li>
+    <p><code>numRecordsOut</code>, <code>numRecordsIn</code><br />
+Each metric comes with two scopes: one scoped to the operator and one scoped to the subtask. For network monitoring, the subtask-scoped metric is relevant and shows the total number of records it has sent/received. You may need to further look into these figures to extract the number of records within a certain time span or use the equivalent <code>…PerSecond</code> metrics.</p>
+  </li>
+  <li>
+    <p><code>numBytesOut</code>, <code>numBytesInLocal</code>, <code>numBytesInRemote</code><br />
+The total number of bytes this subtask has emitted or read from a local/remote source. These are also available as meters via <code>…PerSecond</code> metrics.</p>
+  </li>
+  <li>
+    <p><code>numBuffersOut</code>, <code>numBuffersInLocal</code>, <code>numBuffersInRemote</code><br />
+Similar to <code>numBytes…</code> but counting the number of network buffers.</p>
+  </li>
+</ul>
+
+<div class="alert alert-warning">
+  <p><span class="label label-warning" style="display: inline-block"><span class="glyphicon glyphicon-warning-sign" aria-hidden="true"></span> Warning</span>
+For the sake of completeness and since they have been used in the past, we will briefly look at the <code>outputQueueLength</code> and <code>inputQueueLength</code> metrics. These are somewhat similar to the <code>[out,in]PoolUsage</code> metrics but show the number of buffers sitting in a sender subtask’s output queues and in a receiver subtask’s input queues, respectively. Reasoning about absolute numbers of buffers, however, is difficult and there is also a special subtlety with local [...]
+
+  <p>Overall, <strong>we discourage the use of</strong> <code>outputQueueLength</code> <strong>and</strong> <code>inputQueueLength</code> because their interpretation highly depends on the current parallelism of the operator and the configured numbers of exclusive and floating buffers. Instead, we recommend using the various <code>*PoolUsage</code> metrics which even reveal more detailed insight.</p>
+</div>
+
+<div class="alert alert-info">
+  <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+ If you reason about buffer usage, please keep the following in mind:</p>
+
+  <ul>
+    <li>Any outgoing channel which has been used at least once will always occupy one buffer (since Flink 1.5).
+      <ul>
+        <li><strong><span style="color:orange">up to Flink 1.8:</span></strong> This buffer (even if empty!) was always counted as a backlog of 1 and thus receivers tried to reserve a floating buffer for it.</li>
+        <li><strong><span style="color:green">Flink 1.9 and above:</span></strong> A buffer is only counted in the backlog if it is ready for consumption, i.e. it is full or was flushed (see FLINK-11082)</li>
+      </ul>
+    </li>
+    <li>The receiver will only release a received buffer after deserialising the last record in it.</li>
+  </ul>
+</div>
+
+<p>The following sections make use of and combine these metrics to reason about backpressure and resource usage / efficiency with respect to throughput. A separate section will detail latency related metrics.</p>
+
+<h3 id="backpressure">Backpressure</h3>
+
+<p>Backpressure may be indicated by two different sets of metrics: (local) buffer pool usages as well as input/output queue lengths. They provide a different level of granularity but, unfortunately, none of these are exhaustive and there is room for interpretation. Because of the inherent problems with interpreting these queue lengths we will focus on the usage of input and output pools below which also provides more detail.</p>
+
+<ul>
+  <li>
+    <p><strong>If a subtask’s</strong> <code>outPoolUsage</code> <strong>is 100%</strong>, it is backpressured. Whether the subtask is already blocking or still writing records into network buffers depends on how full the buffers are, that the <code>RecordWriters</code> are currently writing into.<br />
+<span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;"></span> This is different to what the backpressure monitor is showing!</p>
+  </li>
+  <li>
+    <p>An <code>inPoolUsage</code> of 100% means that all floating buffers are assigned to channels and eventually backpressure will be exercised upstream. These floating buffers are in either of the following conditions: they are reserved for future use on a channel due to an exclusive buffer being utilised (remote input channels always try to maintain <code>#exclusive buffers</code> credits), they are reserved for a sender’s backlog and wait for data, they may contain data and are enqu [...]
+  </li>
+  <li>
+    <p><strong><span style="color:orange">up to Flink 1.8:</span></strong> Due to <a href="https://issues.apache.org/jira/browse/FLINK-11082">FLINK-11082</a>, an <code>inPoolUsage</code> of 100% is quite common even in normal situations.</p>
+  </li>
+  <li>
+    <p><strong><span style="color:green">Flink 1.9 and above:</span></strong> If <code>inPoolUsage</code> is constantly around 100%, this is a strong indicator for exercising backpressure upstream.</p>
+  </li>
+</ul>
+
+<p>The following table summarises all combinations and their interpretation. Bear in mind, though, that backpressure may be minor or temporary (no need to look into it), on particular channels only, or caused by other JVM processes on a particular TaskManager, such as GC, synchronisation, I/O, resource shortage, instead of a specific subtask.</p>
+
+<center>
+<table class="tg">
+  <tr>
+    <th></th>
+    <th class="tg-center"><code>outPoolUsage</code> low</th>
+    <th class="tg-center"><code>outPoolUsage</code> high</th>
+  </tr>
+  <tr>
+    <th class="tg-top"><code>inPoolUsage</code> low</th>
+    <td class="tg-topcenter">
+      <span class="glyphicon glyphicon-ok-sign" aria-hidden="true" style="color:green;font-size:1.5em;"></span></td>
+    <td class="tg-topcenter">
+      <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br />
+      (backpressured, temporary situation: upstream is not backpressured yet or not anymore)</td>
+  </tr>
+  <tr>
+    <th class="tg-top" rowspan="2">
+      <code>inPoolUsage</code> high<br />
+      (<strong><span style="color:green">Flink 1.9+</span></strong>)</th>
+    <td class="tg-topcenter">
+      if all upstream tasks’<code>outPoolUsage</code> are low: <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br />
+      (may eventually cause backpressure)</td>
+    <td class="tg-topcenter" rowspan="2">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br />
+      (backpressured by downstream task(s) or network, probably forwarding backpressure upstream)</td>
+  </tr>
+  <tr>
+    <td class="tg-topcenter">if any upstream task’s<code>outPoolUsage</code> is high: <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br />
+      (may exercise backpressure upstream and may be the source of backpressure)</td>
+  </tr>
+</table>
+</center>
+
+<p><br />
+We may even reason more about the cause of backpressure by looking at the network metrics of the subtasks of two consecutive tasks:</p>
+
+<ul>
+  <li>If all subtasks of the receiver task have low <code>inPoolUsage</code> values and any upstream subtask’s <code>outPoolUsage</code> is high, then there may be a network bottleneck causing backpressure.
+Since network is a shared resource among all subtasks of a TaskManager, this may not directly originate from this subtask, but rather from various concurrent operations, e.g. checkpoints, other streams, external connections, or other TaskManagers/processes on the same machine.</li>
+</ul>
+
+<p>Backpressure can also be caused by all parallel instances of a task or by a single task instance. The first usually happens because the task is performing some time consuming operation that applies to all input partitions. The latter is usually the result of some kind of skew, either data skew or resource availability/allocation skew. In either case, you can find some hints on how to handle such situations in the <a href="#span-classlabel-label-info-styledisplay-inline-blockspan-class [...]
+
+<div class="alert alert-info">
+  <h3 class="no_toc" id="span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-flink-19-and-above"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Flink 1.9 and above</h3>
+
+  <ul>
+    <li>If <code>floatingBuffersUsage</code> is not 100%, it is unlikely that there is backpressure. If it is 100% and any upstream task is backpressured, it suggests that this input is exercising backpressure on either a single, some or all input channels. To differentiate between those three situations you can use <code>exclusiveBuffersUsage</code>:
+      <ul>
+        <li>Assuming that <code>floatingBuffersUsage</code> is around 100%, the higher the <code>exclusiveBuffersUsage</code> the more input channels are backpressured. In an extreme case of <code>exclusiveBuffersUsage</code> being close to 100%, it means that all channels are backpressured.</li>
+      </ul>
+    </li>
+  </ul>
+
+  <p><br />
+The relation between <code>exclusiveBuffersUsage</code>, <code>floatingBuffersUsage</code>, and the upstream tasks’ <code>outPoolUsage</code> is summarised in the following table and extends on the table above with <code>inPoolUsage = floatingBuffersUsage + exclusiveBuffersUsage</code>:</p>
+
+  <center>
+<table class="tg">
+  <tr>
+    <th></th>
+    <th><code>exclusiveBuffersUsage</code> low</th>
+    <th><code>exclusiveBuffersUsage</code> high</th>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> low +<br />
+      <em>all</em> upstream <code>outPoolUsage</code> low</th>
+    <td class="tg-center"><span class="glyphicon glyphicon-ok-sign" aria-hidden="true" style="color:green;font-size:1.5em;"></span></td>
+    <td class="tg-center">-<sup>3</sup></td>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> low +<br />
+      <em>any</em> upstream <code>outPoolUsage</code> high</th>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br />
+      (potential network bottleneck)</td>
+    <td class="tg-center">-<sup>3</sup></td>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> high +<br />
+      <em>all</em> upstream <code>outPoolUsage</code> low</th>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br />
+      (backpressure eventually appears on only some of the input channels)</td>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br />
+      (backpressure eventually appears on most or all of the input channels)</td>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> high +<br />
+      any upstream <code>outPoolUsage</code> high</th>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br />
+      (backpressure on only some of the input channels)</td>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br />
+      (backpressure on most or all of the input channels)</td>
+  </tr>
+</table>
+</center>
+
+  <p><sup>3</sup> this should not happen</p>
+
+</div>
+
+<h3 id="resource-usage--throughput">Resource Usage / Throughput</h3>
+
+<p>Besides the obvious use of each individual metric mentioned above, there are also a few combinations providing useful insight into what is happening in the network stack:</p>
+
+<ul>
+  <li>
+    <p>Low throughput with frequent <code>outPoolUsage</code> values around 100% but low <code>inPoolUsage</code> on all receivers is an indicator that the round-trip-time of our credit-notification (depends on your network’s latency) is too high for the default number of exclusive buffers to make use of your bandwidth. Consider increasing the <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel">buffers-per-c [...]
+  </li>
+  <li>
+    <p>Combining <code>numRecordsOut</code> and <code>numBytesOut</code> helps identifying average serialised record sizes which supports you in capacity planning for peak scenarios.</p>
+  </li>
+  <li>
+    <p>If you want to reason about buffer fill rates and the influence of the output flusher, you may combine <code>numBytesInRemote</code> with <code>numBuffersInRemote</code>. When tuning for throughput (and not latency!), low buffer fill rates may indicate reduced network efficiency. In such cases, consider increasing the buffer timeout.
+Please note that, as of Flink 1.8 and 1.9, <code>numBuffersOut</code> only increases for buffers getting full or for an event cutting off a buffer (e.g. a checkpoint barrier) and may lag behind. Please also note that reasoning about buffer fill rates on local channels is unnecessary since buffering is an optimisation technique for remote channels with limited effect on local channels.</p>
+  </li>
+  <li>
+    <p>You may also separate local from remote traffic using numBytesInLocal and numBytesInRemote but in most cases this is unnecessary.</p>
+  </li>
+</ul>
+
+<div class="alert alert-info">
+  <h3 class="no_toc" id="span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-what-to-do-with-backpressure"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> What to do with Backpressure?</h3>
+
+  <p>Assuming that you identified where the source of backpressure — a bottleneck — is located, the next step is to analyse why this is happening. Below, we list some potential causes of backpressure from the more basic to the more complex ones. We recommend to check the basic causes first, before diving deeper on the more complex ones and potentially drawing false conclusions.</p>
+
+  <p>Please also recall that backpressure might be temporary and the result of a load spike, checkpointing, or a job restart with a data backlog waiting to be processed. If backpressure is temporary, you should simply ignore it. Alternatively, keep in mind that the process of analysing and solving the issue can be affected by the intermittent nature of your bottleneck. Having said that, here are a couple of things to check.</p>
+
+  <h4 id="system-resources">System Resources</h4>
+
+  <p>Firstly, you should check the incriminated machines’ basic resource usage like CPU, network, or disk I/O. If some resource is fully or heavily utilised you can do one of the following:</p>
+
+  <ol>
+    <li>Try to optimise your code. Code profilers are helpful in this case.</li>
+    <li>Tune Flink for that specific resource.</li>
+    <li>Scale out by increasing the parallelism and/or increasing the number of machines in the cluster.</li>
+  </ol>
+
+  <h4 id="garbage-collection">Garbage Collection</h4>
+
+  <p>Oftentimes, performance issues arise from long GC pauses. You can verify whether you are in such a situation by either printing debug GC logs (via -<code>XX:+PrintGCDetails</code>) or by using some memory/GC profilers. Since dealing with GC issues is highly application-dependent and independent of Flink, we will not go into details here (<a href="https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html">Oracle’s Garbage Collection Tuning Guide</a> or <a href="ht [...]
+
+  <h4 id="cputhread-bottleneck">CPU/Thread Bottleneck</h4>
+
+  <p>Sometimes a CPU bottleneck might not be visible at first glance if one or a couple of threads are causing the CPU bottleneck while the CPU usage of the overall machine remains relatively low. For instance, a single CPU-bottlenecked thread on a 48-core machine would result in only 2% CPU use. Consider using code profilers for this as they can identify hot threads by showing each threads’ CPU usage, for example.</p>
+
+  <h4 id="thread-contention">Thread Contention</h4>
+
+  <p>Similarly to the CPU/thread bottleneck issue above, a subtask may be bottlenecked due to high thread contention on shared resources. Again, CPU profilers are your best friend here! Consider looking for synchronisation overhead / lock contention in user code — although adding synchronisation in user code should be avoided and may even be dangerous! Also consider investigating shared system resources. The default JVM’s SSL implementation, for example, can become contented around the s [...]
+
+  <h4 id="load-imbalance">Load Imbalance</h4>
+
+  <p>If your bottleneck is caused by data skew, you can try to remove it or mitigate its impact by changing the data partitioning to separate heavy keys or by implementing local/pre-aggregation.</p>
+
+  <p><br />
+This list is far from exhaustive. Generally, in order to reduce a bottleneck and thus backpressure, first analyse where it is happening and then find out why. The best place to start reasoning about the “why” is by checking what resources are fully utilised.</p>
+</div>
+
+<h3 id="latency-tracking">Latency Tracking</h3>
+
+<p>Tracking latencies at the various locations they may occur is a topic of its own. In this section, we will focus on the time records wait inside Flink’s network stack — including the system’s network connections. In low throughput scenarios, these latencies are influenced directly by the output flusher via the buffer timeout parameter or indirectly by any application code latencies. When processing a record takes longer than expected or when (multiple) timers fire at the same time — a [...]
+
+<p>Flink offers some support for <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#latency-tracking">tracking the latency</a> of records passing through the system (outside of user code). However, this is disabled by default (see below why!) and must be enabled by setting a latency tracking interval either in Flink’s <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#metrics-latency-interval">configuration via < [...]
+
+<ul>
+  <li><code>single</code>: one histogram for each operator subtask</li>
+  <li><code>operator</code> (default): one histogram for each combination of source task and operator subtask</li>
+  <li><code>subtask</code>: one histogram for each combination of source subtask and operator subtask (quadratic in the parallelism!)</li>
+</ul>
+
+<p>These metrics are collected through special “latency markers”: each source subtask will periodically emit a special record containing the timestamp of its creation. The latency markers then flow alongside normal records while not overtaking them on the wire or inside a buffer queue. However, <em>a latency marker does not enter application logic</em> and is overtaking records there. Latency markers therefore only measure the waiting time between the user code and not a full “end-to-end [...]
+
+<p>Since <code>LatencyMarkers</code> sit in network buffers just like normal records, they will also wait for the buffer to be full or flushed due to buffer timeouts. When a channel is on high load, there is no added latency by the network buffering data. However, as soon as one channel is under low load, records and latency markers will experience an expected average delay of at most <code>buffer_timeout / 2</code>. This delay will add to each network connection towards a subtask and sh [...]
+
+<p>By looking at the exposed latency tracking metrics for each subtask, for example at the 95th percentile, you should nevertheless be able to identify subtasks which are adding substantially to the overall source-to-sink latency and continue with optimising there.</p>
+
+<div class="alert alert-info">
+  <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+Flink’s latency markers assume that the clocks on all machines in the cluster are in sync. We recommend setting up an automated clock synchronisation service (like NTP) to avoid false latency results.</p>
+</div>
+
+<div class="alert alert-warning">
+  <p><span class="label label-warning" style="display: inline-block"><span class="glyphicon glyphicon-warning-sign" aria-hidden="true"></span> Warning</span>
+Enabling latency metrics can significantly impact the performance of the cluster (in particular for <code>subtask</code> granularity) due to the sheer amount of metrics being added as well as the use of histograms which are quite expensive to maintain. It is highly recommended to only use them for debugging purposes.</p>
+</div>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<p>In the previous sections we discussed how to monitor Flink’s network stack which primarily involves identifying backpressure: where it occurs, where it originates from, and (potentially) why it occurs. This can be executed in two ways: for simple cases and debugging sessions by using the backpressure monitor; for continuous monitoring, more in-depth analysis, and less runtime overhead by using Flink’s task and network stack metrics. Backpressure can be caused by the network layer itse [...]
+
+<p>Stay tuned for the third blog post in the series of network stack posts that will focus on tuning techniques and anti-patterns to avoid.</p>
+
+
+      </article>
+    </div>
+
+    <div class="row">
+      <div id="disqus_thread"></div>
+      <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+             (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+      </script>
+    </div>
+  </div>
+</div>
+      </div>
+    </div>
+
+    <hr />
+
+    <div class="row">
+      <div class="footer text-center col-sm-12">
+        <p>Copyright © 2014-2019 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p>
+        <p>Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p>
+        <p><a href="/privacy-policy.html">Privacy Policy</a> &middot; <a href="/blog/feed.xml">RSS feed</a></p>
+      </div>
+    </div>
+    </div><!-- /.container -->
+
+    <!-- Include all compiled plugins (below), or include individual files as needed -->
+    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.matchHeight/0.7.0/jquery.matchHeight-min.js"></script>
+    <script src="/js/codetabs.js"></script>
+    <script src="/js/stickysidebar.js"></script>
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-52545728-1', 'auto');
+      ga('send', 'pageview');
+    </script>
+  </body>
+</html>
diff --git a/content/blog/feed.xml b/content/blog/feed.xml
index ebfe803..cc8dfe3 100644
--- a/content/blog/feed.xml
+++ b/content/blog/feed.xml
@@ -7,6 +7,366 @@
 <atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" />
 
 <item>
+<title>Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</title>
+<description>&lt;style type=&quot;text/css&quot;&gt;
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
+.tg th{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
+.tg .tg-wide{padding:10px 30px;}
+.tg .tg-top{vertical-align:top}
+.tg .tg-topcenter{text-align:center;vertical-align:top}
+.tg .tg-center{text-align:center;vertical-align:center}
+&lt;/style&gt;
+
+&lt;p&gt;In a &lt;a href=&quot;/2019/06/05/flink-network-stack.html&quot;&gt;previous blog post&lt;/a&gt;, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure,  [...]
+
+&lt;div class=&quot;page-toc&quot;&gt;
+&lt;ul id=&quot;markdown-toc&quot;&gt;
+  &lt;li&gt;&lt;a href=&quot;#monitoring&quot; id=&quot;markdown-toc-monitoring&quot;&gt;Monitoring&lt;/a&gt;    &lt;ul&gt;
+      &lt;li&gt;&lt;a href=&quot;#backpressure-monitor&quot; id=&quot;markdown-toc-backpressure-monitor&quot;&gt;Backpressure Monitor&lt;/a&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#network-metrics&quot; id=&quot;markdown-toc-network-metrics&quot;&gt;Network Metrics&lt;/a&gt;    &lt;ul&gt;
+      &lt;li&gt;&lt;a href=&quot;#backpressure&quot; id=&quot;markdown-toc-backpressure&quot;&gt;Backpressure&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;#resource-usage--throughput&quot; id=&quot;markdown-toc-resource-usage--throughput&quot;&gt;Resource Usage / Throughput&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;#latency-tracking&quot; id=&quot;markdown-toc-latency-tracking&quot;&gt;Latency Tracking&lt;/a&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;/div&gt;
+
+&lt;h2 id=&quot;monitoring&quot;&gt;Monitoring&lt;/h2&gt;
+
+&lt;p&gt;Probably the most important part of network monitoring is &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html&quot;&gt;monitoring backpressure&lt;/a&gt;, a situation where a system is receiving data at a higher rate than it can process¹. Such behaviour will result in the sender being backpressured and may be caused by two things:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The receiver is slow.&lt;br /&gt;
+This can happen because the receiver is backpressured itself, is unable to keep processing at the same rate as the sender, or is temporarily blocked by garbage collection, lack of system resources, or I/O.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The network channel is slow.&lt;br /&gt;
+  Even though in such case the receiver is not (directly) involved, we call the sender backpressured due to a potential oversubscription on network bandwidth shared by all subtasks running on the same machine. Beware that, in addition to Flink’s network stack, there may be more network users, such as sources and sinks, distributed file systems (checkpointing, network-attached storage), logging, and metrics. A previous &lt;a href=&quot;https://www.ververica.com/blog/how-to-size-your-apach [...]
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; In case you are unfamiliar with backpressure and how it interacts with Flink, we recommend reading through &lt;a href=&quot;https://www.ververica.com/blog/how-flink-handles-backpressure&quot;&gt;this blog post on backpressure&lt;/a&gt; from 2015.&lt;/p&gt;
+
+&lt;p&gt;&lt;br /&gt;
+If backpressure occurs, it will bubble upstream and eventually reach your sources and slow them down. This is not a bad thing per-se and merely states that you lack resources for the current load. However, you may want to improve your job so that it can cope with higher loads without using more resources. In order to do so, you need to find (1) where (at which task/operator) the bottleneck is and (2) what is causing it. Flink offers two mechanisms for identifying where the bottleneck is: [...]
+
+&lt;ul&gt;
+  &lt;li&gt;directly via Flink’s web UI and its backpressure monitor, or&lt;/li&gt;
+  &lt;li&gt;indirectly through some of the network metrics.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Flink’s web UI is likely the first entry point for a quick troubleshooting but has some disadvantages that we will explain below. On the other hand, Flink’s network metrics are better suited for continuous monitoring and reasoning about the exact nature of the bottleneck causing backpressure. We will cover both in the sections below. In both cases, you need to identify the origin of backpressure from the sources to the sinks. Your starting point for the current and future invest [...]
+
+&lt;h3 id=&quot;backpressure-monitor&quot;&gt;Backpressure Monitor&lt;/h3&gt;
+
+&lt;p&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html&quot;&gt;backpressure monitor&lt;/a&gt; is only exposed via Flink’s web UI². Since it’s an active component that is only triggered on request, it is currently not available via metrics. The backpressure monitor samples the running tasks’ threads on all TaskManagers via &lt;code&gt;Thread.getStackTrace()&lt;/code&gt; and computes the number of samples where tasks were bl [...]
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;span style=&quot;color:green&quot;&gt;OK&lt;/span&gt; for &lt;code&gt;ratio ≤ 0.10&lt;/code&gt;,&lt;/li&gt;
+  &lt;li&gt;&lt;span style=&quot;color:orange&quot;&gt;LOW&lt;/span&gt; for &lt;code&gt;0.10 &amp;lt; Ratio ≤ 0.5&lt;/code&gt;, and&lt;/li&gt;
+  &lt;li&gt;&lt;span style=&quot;color:red&quot;&gt;HIGH&lt;/span&gt; for &lt;code&gt;0.5 &amp;lt; Ratio ≤ 1&lt;/code&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Although you can tune things like the refresh-interval, the number of samples, or the delay between samples, normally, you would not need to touch these since the defaults already give good-enough results.&lt;/p&gt;
+
+&lt;center&gt;
+&lt;img src=&quot;/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png&quot; width=&quot;600px&quot; alt=&quot;Backpressure sampling:high&quot; /&gt;
+&lt;/center&gt;
+
+&lt;p&gt;&lt;sup&gt;2&lt;/sup&gt; You may also access the backpressure monitor via the REST API: &lt;code&gt;/jobs/:jobid/vertices/:vertexid/backpressure&lt;/code&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;br /&gt;
+The backpressure monitor can help you find where (at which task/operator) backpressure originates from. However, it does not support you in further reasoning about the causes of it. Additionally, for larger jobs or higher parallelism, the backpressure monitor becomes too crowded to use and may also take some time to gather all information from all TaskManagers. Please also note that sampling may affect your running job’s performance.&lt;/p&gt;
+
+&lt;h2 id=&quot;network-metrics&quot;&gt;Network Metrics&lt;/h2&gt;
+
+&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#network&quot;&gt;Network&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#io&quot;&gt;task I/O&lt;/a&gt; metrics are more lightweight than the backpressure monitor and are continuously published for each running job. We can leverage those and get even more insights, not only for backpressure monitoring. The most re [...]
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt;, &lt;code&gt;inPoolUsage&lt;/code&gt;&lt;br /&gt;
+An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
+While interpreting &lt;code&gt;inPoolUsage&lt;/code&gt; in Flink 1.5 - 1.8 with credit-based flow control, please note that this only relates to floating buffers (exclusive buffers are not part of the pool).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt;, &lt;code&gt;inPoolUsage&lt;/code&gt;, &lt;code&gt;floatingBuffersUsage&lt;/code&gt;, &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;&lt;br /&gt;
+An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
+Starting with Flink 1.9, &lt;code&gt;inPoolUsage&lt;/code&gt; is the sum of &lt;code&gt;floatingBuffersUsage&lt;/code&gt; and &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;code&gt;numRecordsOut&lt;/code&gt;, &lt;code&gt;numRecordsIn&lt;/code&gt;&lt;br /&gt;
+Each metric comes with two scopes: one scoped to the operator and one scoped to the subtask. For network monitoring, the subtask-scoped metric is relevant and shows the total number of records it has sent/received. You may need to further look into these figures to extract the number of records within a certain time span or use the equivalent &lt;code&gt;…PerSecond&lt;/code&gt; metrics.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;code&gt;numBytesOut&lt;/code&gt;, &lt;code&gt;numBytesInLocal&lt;/code&gt;, &lt;code&gt;numBytesInRemote&lt;/code&gt;&lt;br /&gt;
+The total number of bytes this subtask has emitted or read from a local/remote source. These are also available as meters via &lt;code&gt;…PerSecond&lt;/code&gt; metrics.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;code&gt;numBuffersOut&lt;/code&gt;, &lt;code&gt;numBuffersInLocal&lt;/code&gt;, &lt;code&gt;numBuffersInRemote&lt;/code&gt;&lt;br /&gt;
+Similar to &lt;code&gt;numBytes…&lt;/code&gt; but counting the number of network buffers.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;alert alert-warning&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
+For the sake of completeness and since they have been used in the past, we will briefly look at the &lt;code&gt;outputQueueLength&lt;/code&gt; and &lt;code&gt;inputQueueLength&lt;/code&gt; metrics. These are somewhat similar to the &lt;code&gt;[out,in]PoolUsage&lt;/code&gt; metrics but show the number of buffers sitting in a sender subtask’s output queues and in a receiver subtask’s input queues, respectively. Reasoning about absolute numbers of buffers, however, is difficult and there i [...]
+
+  &lt;p&gt;Overall, &lt;strong&gt;we discourage the use of&lt;/strong&gt; &lt;code&gt;outputQueueLength&lt;/code&gt; &lt;strong&gt;and&lt;/strong&gt; &lt;code&gt;inputQueueLength&lt;/code&gt; because their interpretation highly depends on the current parallelism of the operator and the configured numbers of exclusive and floating buffers. Instead, we recommend using the various &lt;code&gt;*PoolUsage&lt;/code&gt; metrics which even reveal more detailed insight.&lt;/p&gt;
+&lt;/div&gt;
+
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
+ If you reason about buffer usage, please keep the following in mind:&lt;/p&gt;
+
+  &lt;ul&gt;
+    &lt;li&gt;Any outgoing channel which has been used at least once will always occupy one buffer (since Flink 1.5).
+      &lt;ul&gt;
+        &lt;li&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; This buffer (even if empty!) was always counted as a backlog of 1 and thus receivers tried to reserve a floating buffer for it.&lt;/li&gt;
+        &lt;li&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; A buffer is only counted in the backlog if it is ready for consumption, i.e. it is full or was flushed (see FLINK-11082)&lt;/li&gt;
+      &lt;/ul&gt;
+    &lt;/li&gt;
+    &lt;li&gt;The receiver will only release a received buffer after deserialising the last record in it.&lt;/li&gt;
+  &lt;/ul&gt;
+&lt;/div&gt;
+
+&lt;p&gt;The following sections make use of and combine these metrics to reason about backpressure and resource usage / efficiency with respect to throughput. A separate section will detail latency related metrics.&lt;/p&gt;
+
+&lt;h3 id=&quot;backpressure&quot;&gt;Backpressure&lt;/h3&gt;
+
+&lt;p&gt;Backpressure may be indicated by two different sets of metrics: (local) buffer pool usages as well as input/output queue lengths. They provide a different level of granularity but, unfortunately, none of these are exhaustive and there is room for interpretation. Because of the inherent problems with interpreting these queue lengths we will focus on the usage of input and output pools below which also provides more detail.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;If a subtask’s&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt; &lt;strong&gt;is 100%&lt;/strong&gt;, it is backpressured. Whether the subtask is already blocking or still writing records into network buffers depends on how full the buffers are, that the &lt;code&gt;RecordWriters&lt;/code&gt; are currently writing into.&lt;br /&gt;
+&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;&quot;&gt;&lt;/span&gt; This is different to what the backpressure monitor is showing!&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An &lt;code&gt;inPoolUsage&lt;/code&gt; of 100% means that all floating buffers are assigned to channels and eventually backpressure will be exercised upstream. These floating buffers are in either of the following conditions: they are reserved for future use on a channel due to an exclusive buffer being utilised (remote input channels always try to maintain &lt;code&gt;#exclusive buffers&lt;/code&gt; credits), they are reserved for a sender’s backlog and wait for data, they [...]
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; Due to &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11082&quot;&gt;FLINK-11082&lt;/a&gt;, an &lt;code&gt;inPoolUsage&lt;/code&gt; of 100% is quite common even in normal situations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; If &lt;code&gt;inPoolUsage&lt;/code&gt; is constantly around 100%, this is a strong indicator for exercising backpressure upstream.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;The following table summarises all combinations and their interpretation. Bear in mind, though, that backpressure may be minor or temporary (no need to look into it), on particular channels only, or caused by other JVM processes on a particular TaskManager, such as GC, synchronisation, I/O, resource shortage, instead of a specific subtask.&lt;/p&gt;
+
+&lt;center&gt;
+&lt;table class=&quot;tg&quot;&gt;
+  &lt;tr&gt;
+    &lt;th&gt;&lt;/th&gt;
+    &lt;th class=&quot;tg-center&quot;&gt;&lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
+    &lt;th class=&quot;tg-center&quot;&gt;&lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;th class=&quot;tg-top&quot;&gt;&lt;code&gt;inPoolUsage&lt;/code&gt; low&lt;/th&gt;
+    &lt;td class=&quot;tg-topcenter&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-ok-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:green;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;/td&gt;
+    &lt;td class=&quot;tg-topcenter&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (backpressured, temporary situation: upstream is not backpressured yet or not anymore)&lt;/td&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;th class=&quot;tg-top&quot; rowspan=&quot;2&quot;&gt;
+      &lt;code&gt;inPoolUsage&lt;/code&gt; high&lt;br /&gt;
+      (&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9+&lt;/span&gt;&lt;/strong&gt;)&lt;/th&gt;
+    &lt;td class=&quot;tg-topcenter&quot;&gt;
+      if all upstream tasks’&lt;code&gt;outPoolUsage&lt;/code&gt; are low: &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (may eventually cause backpressure)&lt;/td&gt;
+    &lt;td class=&quot;tg-topcenter&quot; rowspan=&quot;2&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (backpressured by downstream task(s) or network, probably forwarding backpressure upstream)&lt;/td&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;td class=&quot;tg-topcenter&quot;&gt;if any upstream task’s&lt;code&gt;outPoolUsage&lt;/code&gt; is high: &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (may exercise backpressure upstream and may be the source of backpressure)&lt;/td&gt;
+  &lt;/tr&gt;
+&lt;/table&gt;
+&lt;/center&gt;
+
+&lt;p&gt;&lt;br /&gt;
+We may even reason more about the cause of backpressure by looking at the network metrics of the subtasks of two consecutive tasks:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;If all subtasks of the receiver task have low &lt;code&gt;inPoolUsage&lt;/code&gt; values and any upstream subtask’s &lt;code&gt;outPoolUsage&lt;/code&gt; is high, then there may be a network bottleneck causing backpressure.
+Since network is a shared resource among all subtasks of a TaskManager, this may not directly originate from this subtask, but rather from various concurrent operations, e.g. checkpoints, other streams, external connections, or other TaskManagers/processes on the same machine.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Backpressure can also be caused by all parallel instances of a task or by a single task instance. The first usually happens because the task is performing some time consuming operation that applies to all input partitions. The latter is usually the result of some kind of skew, either data skew or resource availability/allocation skew. In either case, you can find some hints on how to handle such situations in the &lt;a href=&quot;#span-classlabel-label-info-styledisplay-inline-b [...]
+
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;h3 class=&quot;no_toc&quot; id=&quot;span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-flink-19-and-above&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Flink 1.9 and above&lt;/h3&gt;
+
+  &lt;ul&gt;
+    &lt;li&gt;If &lt;code&gt;floatingBuffersUsage&lt;/code&gt; is not 100%, it is unlikely that there is backpressure. If it is 100% and any upstream task is backpressured, it suggests that this input is exercising backpressure on either a single, some or all input channels. To differentiate between those three situations you can use &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;:
+      &lt;ul&gt;
+        &lt;li&gt;Assuming that &lt;code&gt;floatingBuffersUsage&lt;/code&gt; is around 100%, the higher the &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; the more input channels are backpressured. In an extreme case of &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; being close to 100%, it means that all channels are backpressured.&lt;/li&gt;
+      &lt;/ul&gt;
+    &lt;/li&gt;
+  &lt;/ul&gt;
+
+  &lt;p&gt;&lt;br /&gt;
+The relation between &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;, &lt;code&gt;floatingBuffersUsage&lt;/code&gt;, and the upstream tasks’ &lt;code&gt;outPoolUsage&lt;/code&gt; is summarised in the following table and extends on the table above with &lt;code&gt;inPoolUsage = floatingBuffersUsage + exclusiveBuffersUsage&lt;/code&gt;:&lt;/p&gt;
+
+  &lt;center&gt;
+&lt;table class=&quot;tg&quot;&gt;
+  &lt;tr&gt;
+    &lt;th&gt;&lt;/th&gt;
+    &lt;th&gt;&lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; low&lt;/th&gt;
+    &lt;th&gt;&lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; high&lt;/th&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
+      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; low +&lt;br /&gt;
+      &lt;em&gt;all&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-ok-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:green;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;-&lt;sup&gt;3&lt;/sup&gt;&lt;/td&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
+      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; low +&lt;br /&gt;
+      &lt;em&gt;any&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (potential network bottleneck)&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;-&lt;sup&gt;3&lt;/sup&gt;&lt;/td&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
+      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; high +&lt;br /&gt;
+      &lt;em&gt;all&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (backpressure eventually appears on only some of the input channels)&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (backpressure eventually appears on most or all of the input channels)&lt;/td&gt;
+  &lt;/tr&gt;
+  &lt;tr&gt;
+    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
+      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; high +&lt;br /&gt;
+      any upstream &lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (backpressure on only some of the input channels)&lt;/td&gt;
+    &lt;td class=&quot;tg-center&quot;&gt;
+      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
+      (backpressure on most or all of the input channels)&lt;/td&gt;
+  &lt;/tr&gt;
+&lt;/table&gt;
+&lt;/center&gt;
+
+  &lt;p&gt;&lt;sup&gt;3&lt;/sup&gt; this should not happen&lt;/p&gt;
+
+&lt;/div&gt;
+
+&lt;h3 id=&quot;resource-usage--throughput&quot;&gt;Resource Usage / Throughput&lt;/h3&gt;
+
+&lt;p&gt;Besides the obvious use of each individual metric mentioned above, there are also a few combinations providing useful insight into what is happening in the network stack:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low throughput with frequent &lt;code&gt;outPoolUsage&lt;/code&gt; values around 100% but low &lt;code&gt;inPoolUsage&lt;/code&gt; on all receivers is an indicator that the round-trip-time of our credit-notification (depends on your network’s latency) is too high for the default number of exclusive buffers to make use of your bandwidth. Consider increasing the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-mem [...]
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Combining &lt;code&gt;numRecordsOut&lt;/code&gt; and &lt;code&gt;numBytesOut&lt;/code&gt; helps identifying average serialised record sizes which supports you in capacity planning for peak scenarios.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If you want to reason about buffer fill rates and the influence of the output flusher, you may combine &lt;code&gt;numBytesInRemote&lt;/code&gt; with &lt;code&gt;numBuffersInRemote&lt;/code&gt;. When tuning for throughput (and not latency!), low buffer fill rates may indicate reduced network efficiency. In such cases, consider increasing the buffer timeout.
+Please note that, as of Flink 1.8 and 1.9, &lt;code&gt;numBuffersOut&lt;/code&gt; only increases for buffers getting full or for an event cutting off a buffer (e.g. a checkpoint barrier) and may lag behind. Please also note that reasoning about buffer fill rates on local channels is unnecessary since buffering is an optimisation technique for remote channels with limited effect on local channels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;You may also separate local from remote traffic using numBytesInLocal and numBytesInRemote but in most cases this is unnecessary.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;h3 class=&quot;no_toc&quot; id=&quot;span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-what-to-do-with-backpressure&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; What to do with Backpressure?&lt;/h3&gt;
+
+  &lt;p&gt;Assuming that you identified where the source of backpressure — a bottleneck — is located, the next step is to analyse why this is happening. Below, we list some potential causes of backpressure from the more basic to the more complex ones. We recommend to check the basic causes first, before diving deeper on the more complex ones and potentially drawing false conclusions.&lt;/p&gt;
+
+  &lt;p&gt;Please also recall that backpressure might be temporary and the result of a load spike, checkpointing, or a job restart with a data backlog waiting to be processed. If backpressure is temporary, you should simply ignore it. Alternatively, keep in mind that the process of analysing and solving the issue can be affected by the intermittent nature of your bottleneck. Having said that, here are a couple of things to check.&lt;/p&gt;
+
+  &lt;h4 id=&quot;system-resources&quot;&gt;System Resources&lt;/h4&gt;
+
+  &lt;p&gt;Firstly, you should check the incriminated machines’ basic resource usage like CPU, network, or disk I/O. If some resource is fully or heavily utilised you can do one of the following:&lt;/p&gt;
+
+  &lt;ol&gt;
+    &lt;li&gt;Try to optimise your code. Code profilers are helpful in this case.&lt;/li&gt;
+    &lt;li&gt;Tune Flink for that specific resource.&lt;/li&gt;
+    &lt;li&gt;Scale out by increasing the parallelism and/or increasing the number of machines in the cluster.&lt;/li&gt;
+  &lt;/ol&gt;
+
+  &lt;h4 id=&quot;garbage-collection&quot;&gt;Garbage Collection&lt;/h4&gt;
+
+  &lt;p&gt;Oftentimes, performance issues arise from long GC pauses. You can verify whether you are in such a situation by either printing debug GC logs (via -&lt;code&gt;XX:+PrintGCDetails&lt;/code&gt;) or by using some memory/GC profilers. Since dealing with GC issues is highly application-dependent and independent of Flink, we will not go into details here (&lt;a href=&quot;https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html&quot;&gt;Oracle’s Garbage Collecti [...]
+
+  &lt;h4 id=&quot;cputhread-bottleneck&quot;&gt;CPU/Thread Bottleneck&lt;/h4&gt;
+
+  &lt;p&gt;Sometimes a CPU bottleneck might not be visible at first glance if one or a couple of threads are causing the CPU bottleneck while the CPU usage of the overall machine remains relatively low. For instance, a single CPU-bottlenecked thread on a 48-core machine would result in only 2% CPU use. Consider using code profilers for this as they can identify hot threads by showing each threads’ CPU usage, for example.&lt;/p&gt;
+
+  &lt;h4 id=&quot;thread-contention&quot;&gt;Thread Contention&lt;/h4&gt;
+
+  &lt;p&gt;Similarly to the CPU/thread bottleneck issue above, a subtask may be bottlenecked due to high thread contention on shared resources. Again, CPU profilers are your best friend here! Consider looking for synchronisation overhead / lock contention in user code — although adding synchronisation in user code should be avoided and may even be dangerous! Also consider investigating shared system resources. The default JVM’s SSL implementation, for example, can become contented around [...]
+
+  &lt;h4 id=&quot;load-imbalance&quot;&gt;Load Imbalance&lt;/h4&gt;
+
+  &lt;p&gt;If your bottleneck is caused by data skew, you can try to remove it or mitigate its impact by changing the data partitioning to separate heavy keys or by implementing local/pre-aggregation.&lt;/p&gt;
+
+  &lt;p&gt;&lt;br /&gt;
+This list is far from exhaustive. Generally, in order to reduce a bottleneck and thus backpressure, first analyse where it is happening and then find out why. The best place to start reasoning about the “why” is by checking what resources are fully utilised.&lt;/p&gt;
+&lt;/div&gt;
+
+&lt;h3 id=&quot;latency-tracking&quot;&gt;Latency Tracking&lt;/h3&gt;
+
+&lt;p&gt;Tracking latencies at the various locations they may occur is a topic of its own. In this section, we will focus on the time records wait inside Flink’s network stack — including the system’s network connections. In low throughput scenarios, these latencies are influenced directly by the output flusher via the buffer timeout parameter or indirectly by any application code latencies. When processing a record takes longer than expected or when (multiple) timers fire at the same ti [...]
+
+&lt;p&gt;Flink offers some support for &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#latency-tracking&quot;&gt;tracking the latency&lt;/a&gt; of records passing through the system (outside of user code). However, this is disabled by default (see below why!) and must be enabled by setting a latency tracking interval either in Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#metrics-l [...]
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;code&gt;single&lt;/code&gt;: one histogram for each operator subtask&lt;/li&gt;
+  &lt;li&gt;&lt;code&gt;operator&lt;/code&gt; (default): one histogram for each combination of source task and operator subtask&lt;/li&gt;
+  &lt;li&gt;&lt;code&gt;subtask&lt;/code&gt;: one histogram for each combination of source subtask and operator subtask (quadratic in the parallelism!)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;These metrics are collected through special “latency markers”: each source subtask will periodically emit a special record containing the timestamp of its creation. The latency markers then flow alongside normal records while not overtaking them on the wire or inside a buffer queue. However, &lt;em&gt;a latency marker does not enter application logic&lt;/em&gt; and is overtaking records there. Latency markers therefore only measure the waiting time between the user code and not  [...]
+
+&lt;p&gt;Since &lt;code&gt;LatencyMarkers&lt;/code&gt; sit in network buffers just like normal records, they will also wait for the buffer to be full or flushed due to buffer timeouts. When a channel is on high load, there is no added latency by the network buffering data. However, as soon as one channel is under low load, records and latency markers will experience an expected average delay of at most &lt;code&gt;buffer_timeout / 2&lt;/code&gt;. This delay will add to each network conne [...]
+
+&lt;p&gt;By looking at the exposed latency tracking metrics for each subtask, for example at the 95th percentile, you should nevertheless be able to identify subtasks which are adding substantially to the overall source-to-sink latency and continue with optimising there.&lt;/p&gt;
+
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
+Flink’s latency markers assume that the clocks on all machines in the cluster are in sync. We recommend setting up an automated clock synchronisation service (like NTP) to avoid false latency results.&lt;/p&gt;
+&lt;/div&gt;
+
+&lt;div class=&quot;alert alert-warning&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
+Enabling latency metrics can significantly impact the performance of the cluster (in particular for &lt;code&gt;subtask&lt;/code&gt; granularity) due to the sheer amount of metrics being added as well as the use of histograms which are quite expensive to maintain. It is highly recommended to only use them for debugging purposes.&lt;/p&gt;
+&lt;/div&gt;
+
+&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
+
+&lt;p&gt;In the previous sections we discussed how to monitor Flink’s network stack which primarily involves identifying backpressure: where it occurs, where it originates from, and (potentially) why it occurs. This can be executed in two ways: for simple cases and debugging sessions by using the backpressure monitor; for continuous monitoring, more in-depth analysis, and less runtime overhead by using Flink’s task and network stack metrics. Backpressure can be caused by the network laye [...]
+
+&lt;p&gt;Stay tuned for the third blog post in the series of network stack posts that will focus on tuning techniques and anti-patterns to avoid.&lt;/p&gt;
+
+</description>
+<pubDate>Tue, 23 Jul 2019 17:30:00 +0200</pubDate>
+<link>https://flink.apache.org/2019/07/23/flink-network-stack-2.html</link>
+<guid isPermaLink="true">/2019/07/23/flink-network-stack-2.html</guid>
+</item>
+
+<item>
 <title>Apache Flink 1.8.1 Released</title>
 <description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;
 
diff --git a/content/blog/index.html b/content/blog/index.html
index 71c9647..1a5e52b 100644
--- a/content/blog/index.html
+++ b/content/blog/index.html
@@ -162,6 +162,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></h2>
+
+      <p>23 Jul 2019
+       Nico Kruber  &amp; Piotr Nowojski </p>
+
+      <p>In a previous blog post, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second  post discusses monitoring network-related metrics to identify backpressure or bottlenecks in throughput and latency.</p>
+
+      <p><a href="/2019/07/23/flink-network-stack-2.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></h2>
 
       <p>02 Jul 2019
@@ -288,19 +301,6 @@ for more details.</p>
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2019/03/06/ffsf-preview.html">What to expect from Flink Forward San Francisco 2019</a></h2>
-
-      <p>06 Mar 2019
-       Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p>
-
-      <p>The third annual Flink Forward conference in San Francisco is just a few weeks away. Let's see what Flink Forward SF 2019 has in store for the Apache Flink and stream processing communities. This post covers some of its highlights!</p>
-
-      <p><a href="/news/2019/03/06/ffsf-preview.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -333,6 +333,16 @@ for more details.</p>
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html
index e639939..084a7f7 100644
--- a/content/blog/page2/index.html
+++ b/content/blog/page2/index.html
@@ -162,6 +162,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2019/03/06/ffsf-preview.html">What to expect from Flink Forward San Francisco 2019</a></h2>
+
+      <p>06 Mar 2019
+       Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p>
+
+      <p>The third annual Flink Forward conference in San Francisco is just a few weeks away. Let's see what Flink Forward SF 2019 has in store for the Apache Flink and stream processing communities. This post covers some of its highlights!</p>
+
+      <p><a href="/news/2019/03/06/ffsf-preview.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2019/02/25/monitoring-best-practices.html">Monitoring Apache Flink Applications 101</a></h2>
 
       <p>25 Feb 2019
@@ -294,21 +307,6 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2018/10/29/release-1.5.5.html">Apache Flink 1.5.5 Released</a></h2>
-
-      <p>29 Oct 2018
-      </p>
-
-      <p><p>The Apache Flink community released the fifth bugfix version of the Apache Flink 1.5 series.</p>
-
-</p>
-
-      <p><a href="/news/2018/10/29/release-1.5.5.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -341,6 +339,16 @@ Please check the <a href="https://issues.apache.org/jira/secure/ReleaseNote.jspa
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html
index a81abee..dab64bf 100644
--- a/content/blog/page3/index.html
+++ b/content/blog/page3/index.html
@@ -162,6 +162,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2018/10/29/release-1.5.5.html">Apache Flink 1.5.5 Released</a></h2>
+
+      <p>29 Oct 2018
+      </p>
+
+      <p><p>The Apache Flink community released the fifth bugfix version of the Apache Flink 1.5 series.</p>
+
+</p>
+
+      <p><a href="/news/2018/10/29/release-1.5.5.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2018/09/20/release-1.6.1.html">Apache Flink 1.6.1 Released</a></h2>
 
       <p>20 Sep 2018
@@ -296,19 +311,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/features/2018/03/01/end-to-end-exactly-once-apache-flink.html">An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)</a></h2>
-
-      <p>01 Mar 2018
-       Piotr Nowojski (<a href="https://twitter.com/PiotrNowojski">@PiotrNowojski</a>) &amp; Mike Winters (<a href="https://twitter.com/wints">@wints</a>)</p>
-
-      <p>Flink 1.4.0 introduced a new feature that makes it possible to build end-to-end exactly-once applications with Flink and data sources and sinks that support transactions.</p>
-
-      <p><a href="/features/2018/03/01/end-to-end-exactly-once-apache-flink.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -341,6 +343,16 @@
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html
index bd6aa7b..e67e9f0 100644
--- a/content/blog/page4/index.html
+++ b/content/blog/page4/index.html
@@ -162,6 +162,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/features/2018/03/01/end-to-end-exactly-once-apache-flink.html">An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)</a></h2>
+
+      <p>01 Mar 2018
+       Piotr Nowojski (<a href="https://twitter.com/PiotrNowojski">@PiotrNowojski</a>) &amp; Mike Winters (<a href="https://twitter.com/wints">@wints</a>)</p>
+
+      <p>Flink 1.4.0 introduced a new feature that makes it possible to build end-to-end exactly-once applications with Flink and data sources and sinks that support transactions.</p>
+
+      <p><a href="/features/2018/03/01/end-to-end-exactly-once-apache-flink.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2018/02/15/release-1.4.1.html">Apache Flink 1.4.1 Released</a></h2>
 
       <p>15 Feb 2018
@@ -295,21 +308,6 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2017/05/16/official-docker-image.html">Introducing Docker Images for Apache Flink</a></h2>
-
-      <p>16 May 2017 by Patrick Lucas (Data Artisans) and Ismaël Mejía (Talend) (<a href="https://twitter.com/">@iemejia</a>)
-      </p>
-
-      <p><p>For some time, the Apache Flink community has provided scripts to build a Docker image to run Flink. Now, starting with version 1.2.1, Flink will have a <a href="https://hub.docker.com/r/_/flink/">Docker image</a> on the Docker Hub. This image is maintained by the Flink community and curated by the <a href="https://github.com/docker-library/official-images">Docker</a> team to ensure it meets the quality standards for container images of the Docker community.</p>
-
-</p>
-
-      <p><a href="/news/2017/05/16/official-docker-image.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -342,6 +340,16 @@ what’s coming in Flink 1.4.0 as well as a preview of what the Flink community
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html
index 1243990..8463dfb 100644
--- a/content/blog/page5/index.html
+++ b/content/blog/page5/index.html
@@ -162,6 +162,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2017/05/16/official-docker-image.html">Introducing Docker Images for Apache Flink</a></h2>
+
+      <p>16 May 2017 by Patrick Lucas (Data Artisans) and Ismaël Mejía (Talend) (<a href="https://twitter.com/">@iemejia</a>)
+      </p>
+
+      <p><p>For some time, the Apache Flink community has provided scripts to build a Docker image to run Flink. Now, starting with version 1.2.1, Flink will have a <a href="https://hub.docker.com/r/_/flink/">Docker image</a> on the Docker Hub. This image is maintained by the Flink community and curated by the <a href="https://github.com/docker-library/official-images">Docker</a> team to ensure it meets the quality standards for container images of the Docker community.</p>
+
+</p>
+
+      <p><a href="/news/2017/05/16/official-docker-image.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2017/04/26/release-1.2.1.html">Apache Flink 1.2.1 Released</a></h2>
 
       <p>26 Apr 2017
@@ -289,21 +304,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2016/08/24/ff16-keynotes-panels.html">Flink Forward 2016: Announcing Schedule, Keynotes, and Panel Discussion</a></h2>
-
-      <p>24 Aug 2016
-      </p>
-
-      <p><p>An update for the Flink community: the <a href="http://flink-forward.org/kb_day/day-1/">Flink Forward 2016 schedule</a> is now available online. This year's event will include 2 days of talks from stream processing experts at Google, MapR, Alibaba, Netflix, Cloudera, and more. Following the talks is a full day of hands-on Flink training.</p>
-
-</p>
-
-      <p><a href="/news/2016/08/24/ff16-keynotes-panels.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -336,6 +336,16 @@
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page6/index.html b/content/blog/page6/index.html
index 2a78432..7d4d19a 100644
--- a/content/blog/page6/index.html
+++ b/content/blog/page6/index.html
@@ -162,6 +162,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2016/08/24/ff16-keynotes-panels.html">Flink Forward 2016: Announcing Schedule, Keynotes, and Panel Discussion</a></h2>
+
+      <p>24 Aug 2016
+      </p>
+
+      <p><p>An update for the Flink community: the <a href="http://flink-forward.org/kb_day/day-1/">Flink Forward 2016 schedule</a> is now available online. This year's event will include 2 days of talks from stream processing experts at Google, MapR, Alibaba, Netflix, Cloudera, and more. Following the talks is a full day of hands-on Flink training.</p>
+
+</p>
+
+      <p><a href="/news/2016/08/24/ff16-keynotes-panels.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2016/08/11/release-1.1.1.html">Flink 1.1.1 Released</a></h2>
 
       <p>11 Aug 2016
@@ -293,21 +308,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2016/02/11/release-0.10.2.html">Flink 0.10.2 Released</a></h2>
-
-      <p>11 Feb 2016
-      </p>
-
-      <p><p>Today, the Flink community released Flink version <strong>0.10.2</strong>, the second bugfix release of the 0.10 series.</p>
-
-</p>
-
-      <p><a href="/news/2016/02/11/release-0.10.2.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -340,6 +340,16 @@
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page7/index.html b/content/blog/page7/index.html
index cff4bad..bc908cb 100644
--- a/content/blog/page7/index.html
+++ b/content/blog/page7/index.html
@@ -162,6 +162,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2016/02/11/release-0.10.2.html">Flink 0.10.2 Released</a></h2>
+
+      <p>11 Feb 2016
+      </p>
+
+      <p><p>Today, the Flink community released Flink version <strong>0.10.2</strong>, the second bugfix release of the 0.10 series.</p>
+
+</p>
+
+      <p><a href="/news/2016/02/11/release-0.10.2.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2015/12/18/a-year-in-review.html">Flink 2015: A year in review, and a lookout to 2016</a></h2>
 
       <p>18 Dec 2015 by Robert Metzger (<a href="https://twitter.com/">@rmetzger_</a>)
@@ -297,21 +312,6 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p>
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2015/06/24/announcing-apache-flink-0.9.0-release.html">Announcing Apache Flink 0.9.0</a></h2>
-
-      <p>24 Jun 2015
-      </p>
-
-      <p><p>The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far.</p>
-
-</p>
-
-      <p><a href="/news/2015/06/24/announcing-apache-flink-0.9.0-release.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -344,6 +344,16 @@ vertex-centric or gather-sum-apply to Flink dataflows.</p>
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page8/index.html b/content/blog/page8/index.html
index 4956e66..88f15bc 100644
--- a/content/blog/page8/index.html
+++ b/content/blog/page8/index.html
@@ -162,6 +162,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2015/06/24/announcing-apache-flink-0.9.0-release.html">Announcing Apache Flink 0.9.0</a></h2>
+
+      <p>24 Jun 2015
+      </p>
+
+      <p><p>The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far.</p>
+
+</p>
+
+      <p><a href="/news/2015/06/24/announcing-apache-flink-0.9.0-release.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2015/05/14/Community-update-April.html">April 2015 in the Flink community</a></h2>
 
       <p>14 May 2015 by Kostas Tzoumas (<a href="https://twitter.com/">@kostas_tzoumas</a>)
@@ -303,21 +318,6 @@ and offers a new API including definition of flexible windows.</p>
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2015/01/06/december-in-flink.html">December 2014 in the Flink community</a></h2>
-
-      <p>06 Jan 2015
-      </p>
-
-      <p><p>This is the first blog post of a “newsletter” like series where we give a summary of the monthly activity in the Flink community. As the Flink project grows, this can serve as a “tl;dr” for people that are not following the Flink dev and user mailing lists, or those that are simply overwhelmed by the traffic.</p>
-
-</p>
-
-      <p><a href="/news/2015/01/06/december-in-flink.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -350,6 +350,16 @@ and offers a new API including definition of flexible windows.</p>
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/blog/page9/index.html b/content/blog/page9/index.html
index a3851f8..7772c90 100644
--- a/content/blog/page9/index.html
+++ b/content/blog/page9/index.html
@@ -162,6 +162,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2015/01/06/december-in-flink.html">December 2014 in the Flink community</a></h2>
+
+      <p>06 Jan 2015
+      </p>
+
+      <p><p>This is the first blog post of a “newsletter” like series where we give a summary of the monthly activity in the Flink community. As the Flink project grows, this can serve as a “tl;dr” for people that are not following the Flink dev and user mailing lists, or those that are simply overwhelmed by the traffic.</p>
+
+</p>
+
+      <p><a href="/news/2015/01/06/december-in-flink.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2014/11/18/hadoop-compatibility.html">Hadoop Compatibility in Flink</a></h2>
 
       <p>18 Nov 2014 by Fabian Hüske (<a href="https://twitter.com/">@fhueske</a>)
@@ -271,6 +286,16 @@ academic and open source project that Flink originates from.</p>
 
     <ul id="markdown-toc">
       
+      <li><a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></li>
 
       
diff --git a/content/css/flink.css b/content/css/flink.css
index 9f13341..5c0f621 100755
--- a/content/css/flink.css
+++ b/content/css/flink.css
@@ -87,6 +87,11 @@ h1, h2, h3, h4, h5, h6 {
     margin-top: -60px;
 }
 
+/* fix conflict with bootstrap's alert */
+.alert h4 {
+	margin-top: -60px;
+}
+
 h1 {
 	font-size: 160%;
 }
diff --git a/content/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png b/content/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png
new file mode 100644
index 0000000..15372fd
Binary files /dev/null and b/content/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png differ
diff --git a/content/index.html b/content/index.html
index d8e8537..48f71fc 100644
--- a/content/index.html
+++ b/content/index.html
@@ -462,6 +462,9 @@
 
   <dl>
       
+        <dt> <a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></dt>
+        <dd>In a previous blog post, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second  post discusses monitoring network-related metrics to identify backpressure or bottlenecks in throughput and latency.</dd>
+      
         <dt> <a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></dt>
         <dd><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.8 series.</p>
 
@@ -475,9 +478,6 @@
       
         <dt> <a href="/2019/05/19/state-ttl.html">State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink</a></dt>
         <dd>A common requirement for many stateful streaming applications is to automatically cleanup application state for effective management of your state size, or to control how long the application state can be accessed. State TTL enables application state cleanup and efficient state size management in Apache Flink</dd>
-      
-        <dt> <a href="/2019/05/14/temporal-tables.html">Flux capacitor, huh? Temporal Tables and Joins in Streaming SQL</a></dt>
-        <dd>Apache Flink natively supports temporal table joins since the 1.7 release for straightforward temporal data handling. In this blog post, we provide an overview of how this new concept can be leveraged for effective point-in-time analysis in streaming scenarios.</dd>
     
   </dl>
 
diff --git a/content/roadmap.html b/content/roadmap.html
index aed804b..780a53b 100644
--- a/content/roadmap.html
+++ b/content/roadmap.html
@@ -180,7 +180,7 @@ under the License.
 
 <div class="page-toc">
 <ul id="markdown-toc">
-  <li><a href="#analytics-applications-an-the-roles-of-datastream-dataset-and-table-api" id="markdown-toc-analytics-applications-an-the-roles-of-datastream-dataset-and-table-api">Analytics, Applications, an the roles of DataStream, DataSet, and Table API</a></li>
+  <li><a href="#analytics-applications-and-the-roles-of-datastream-dataset-and-table-api" id="markdown-toc-analytics-applications-and-the-roles-of-datastream-dataset-and-table-api">Analytics, Applications, and the roles of DataStream, DataSet, and Table API</a></li>
   <li><a href="#batch-and-streaming-unification" id="markdown-toc-batch-and-streaming-unification">Batch and Streaming Unification</a></li>
   <li><a href="#fast-batch-bounded-streams" id="markdown-toc-fast-batch-bounded-streams">Fast Batch (Bounded Streams)</a></li>
   <li><a href="#stream-processing-use-cases" id="markdown-toc-stream-processing-use-cases">Stream Processing Use Cases</a></li>
@@ -202,7 +202,7 @@ there is consensus that they will happen and what they will roughly look like fo
 
 <p><strong>Last Update:</strong> 2019-05-08</p>
 
-<h1 id="analytics-applications-an-the-roles-of-datastream-dataset-and-table-api">Analytics, Applications, an the roles of DataStream, DataSet, and Table API</h1>
+<h1 id="analytics-applications-and-the-roles-of-datastream-dataset-and-table-api">Analytics, Applications, and the roles of DataStream, DataSet, and Table API</h1>
 
 <p>Flink views stream processing as a <a href="/flink-architecture.html">unifying paradigm for data processing</a>
 (batch and real-time) and event-driven applications. The APIs are evolving to reflect that view:</p>
diff --git a/content/zh/community.html b/content/zh/community.html
index 19133f5..e6136e1 100644
--- a/content/zh/community.html
+++ b/content/zh/community.html
@@ -580,6 +580,12 @@
     <td class="text-center">shaoxuan</td>
   </tr>
   <tr>
+    <td class="text-center"><img src="https://avatars3.githubusercontent.com/u/12387855?s=50" class="committer-avatar" /></td>
+    <td class="text-center">Zhijiang Wang</td>
+    <td class="text-center">Committer</td>
+    <td class="text-center">zhijiang</td>
+  </tr>
+  <tr>
     <td class="text-center"><img src="https://avatars1.githubusercontent.com/u/1826769?s=50" class="committer-avatar" /></td>
     <td class="text-center">Daniel Warneke</td>
     <td class="text-center">PMC, Committer</td>
diff --git a/content/zh/index.html b/content/zh/index.html
index b0be24d..05ee8d0 100644
--- a/content/zh/index.html
+++ b/content/zh/index.html
@@ -460,6 +460,9 @@
 
   <dl>
       
+        <dt> <a href="/2019/07/23/flink-network-stack-2.html">Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</a></dt>
+        <dd>In a previous blog post, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second  post discusses monitoring network-related metrics to identify backpressure or bottlenecks in throughput and latency.</dd>
+      
         <dt> <a href="/news/2019/07/02/release-1.8.1.html">Apache Flink 1.8.1 Released</a></dt>
         <dd><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.8 series.</p>
 
@@ -473,9 +476,6 @@
       
         <dt> <a href="/2019/05/19/state-ttl.html">State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink</a></dt>
         <dd>A common requirement for many stateful streaming applications is to automatically cleanup application state for effective management of your state size, or to control how long the application state can be accessed. State TTL enables application state cleanup and efficient state size management in Apache Flink</dd>
-      
-        <dt> <a href="/2019/05/14/temporal-tables.html">Flux capacitor, huh? Temporal Tables and Joins in Streaming SQL</a></dt>
-        <dd>Apache Flink natively supports temporal table joins since the 1.7 release for straightforward temporal data handling. In this blog post, we provide an overview of how this new concept can be leveraged for effective point-in-time analysis in streaming scenarios.</dd>
     
   </dl>

[flink-web] 04/05: [Blog] add Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing

Posted by nk...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

nkruber pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit fb57c4a48a8700a107a178fdac2dbdfaec0f3500
Author: Nico Kruber <ni...@ververica.com>
AuthorDate: Wed Jul 17 11:23:18 2019 +0200

    [Blog] add Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing
---
 _posts/2019-07-23-flink-network-stack-2.md         | 315 +++++++++++++++++++++
 css/flink.css                                      |   5 +
 .../back_pressure_sampling_high.png                | Bin 0 -> 77546 bytes
 3 files changed, 320 insertions(+)

diff --git a/_posts/2019-07-23-flink-network-stack-2.md b/_posts/2019-07-23-flink-network-stack-2.md
new file mode 100644
index 0000000..abf97a9
--- /dev/null
+++ b/_posts/2019-07-23-flink-network-stack-2.md
@@ -0,0 +1,315 @@
+---
+layout: post
+title: "Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing"
+date: 2019-07-23T15:30:00.000Z
+authors:
+- Nico:
+  name: "Nico Kruber"
+- Piotr:
+  name: "Piotr Nowojski"
+
+
+excerpt: In a previous blog post, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second  post discusses monitoring network-related metrics to identify backpressure or bottlenecks in throughput and latency.
+---
+
+<style type="text/css">
+.tg  {border-collapse:collapse;border-spacing:0;}
+.tg td{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
+.tg th{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
+.tg .tg-wide{padding:10px 30px;}
+.tg .tg-top{vertical-align:top}
+.tg .tg-topcenter{text-align:center;vertical-align:top}
+.tg .tg-center{text-align:center;vertical-align:center}
+</style>
+
+In a [previous blog post]({{ site.baseurl }}/2019/06/05/flink-network-stack.html), we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure, the topic of tuning the  [...]
+
+{% toc %}
+
+## Monitoring
+
+Probably the most important part of network monitoring is [monitoring backpressure]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring/back_pressure.html), a situation where a system is receiving data at a higher rate than it can process¹. Such behaviour will result in the sender being backpressured and may be caused by two things:
+
+* The receiver is slow.<br>
+  This can happen because the receiver is backpressured itself, is unable to keep processing at the same rate as the sender, or is temporarily blocked by garbage collection, lack of system resources, or I/O.
+
+ * The network channel is slow.<br>
+  Even though in such case the receiver is not (directly) involved, we call the sender backpressured due to a potential oversubscription on network bandwidth shared by all subtasks running on the same machine. Beware that, in addition to Flink’s network stack, there may be more network users, such as sources and sinks, distributed file systems (checkpointing, network-attached storage), logging, and metrics. A previous [capacity planning blog post](https://www.ververica.com/blog/how-to-si [...]
+
+<sup>1</sup> In case you are unfamiliar with backpressure and how it interacts with Flink, we recommend reading through [this blog post on backpressure](https://www.ververica.com/blog/how-flink-handles-backpressure) from 2015.
+
+
+<br>
+If backpressure occurs, it will bubble upstream and eventually reach your sources and slow them down. This is not a bad thing per-se and merely states that you lack resources for the current load. However, you may want to improve your job so that it can cope with higher loads without using more resources. In order to do so, you need to find (1) where (at which task/operator) the bottleneck is and (2) what is causing it. Flink offers two mechanisms for identifying where the bottleneck is:
+
+ * directly via Flink’s web UI and its backpressure monitor, or
+ * indirectly through some of the network metrics.
+
+Flink’s web UI is likely the first entry point for a quick troubleshooting but has some disadvantages that we will explain below. On the other hand, Flink’s network metrics are better suited for continuous monitoring and reasoning about the exact nature of the bottleneck causing backpressure. We will cover both in the sections below. In both cases, you need to identify the origin of backpressure from the sources to the sinks. Your starting point for the current and future investigations  [...]
+
+
+### Backpressure Monitor
+
+The [backpressure monitor]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring/back_pressure.html) is only exposed via Flink’s web UI². Since it's an active component that is only triggered on request, it is currently not available via metrics. The backpressure monitor samples the running tasks' threads on all TaskManagers via `Thread.getStackTrace()` and computes the number of samples where tasks were blocked on a buffer request. These tasks were either unable to send network buff [...]
+
+* <span style="color:green">OK</span> for `ratio ≤ 0.10`,
+* <span style="color:orange">LOW</span> for `0.10 < Ratio ≤ 0.5`, and
+* <span style="color:red">HIGH</span> for `0.5 < Ratio ≤ 1`.
+
+Although you can tune things like the refresh-interval, the number of samples, or the delay between samples, normally, you would not need to touch these since the defaults already give good-enough results.
+
+<center>
+<img src="{{ site.baseurl }}/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png" width="600px" alt="Backpressure sampling:high"/>
+</center>
+
+<sup>2</sup> You may also access the backpressure monitor via the REST API: `/jobs/:jobid/vertices/:vertexid/backpressure`
+
+
+<br>
+The backpressure monitor can help you find where (at which task/operator) backpressure originates from. However, it does not support you in further reasoning about the causes of it. Additionally, for larger jobs or higher parallelism, the backpressure monitor becomes too crowded to use and may also take some time to gather all information from all TaskManagers. Please also note that sampling may affect your running job’s performance.
+
+## Network Metrics
+
+[Network]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring/metrics.html#network) and [task I/O]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring/metrics.html#io) metrics are more lightweight than the backpressure monitor and are continuously published for each running job. We can leverage those and get even more insights, not only for backpressure monitoring. The most relevant metrics for users are:
+
+
+* **<span style="color:orange">up to Flink 1.8:</span>** `outPoolUsage`, `inPoolUsage`<br>
+  An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
+  While interpreting `inPoolUsage` in Flink 1.5 - 1.8 with credit-based flow control, please note that this only relates to floating buffers (exclusive buffers are not part of the pool).
+
+* **<span style="color:green">Flink 1.9 and above:</span>** `outPoolUsage`, `inPoolUsage`, `floatingBuffersUsage`, `exclusiveBuffersUsage`<br>
+  An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
+  Starting with Flink 1.9, `inPoolUsage` is the sum of `floatingBuffersUsage` and `exclusiveBuffersUsage`.
+
+* `numRecordsOut`, `numRecordsIn`<br>
+  Each metric comes with two scopes: one scoped to the operator and one scoped to the subtask. For network monitoring, the subtask-scoped metric is relevant and shows the total number of records it has sent/received. You may need to further look into these figures to extract the number of records within a certain time span or use the equivalent `…PerSecond` metrics.
+
+* `numBytesOut`, `numBytesInLocal`, `numBytesInRemote`<br>
+  The total number of bytes this subtask has emitted or read from a local/remote source. These are also available as meters via `…PerSecond` metrics.
+
+* `numBuffersOut`, `numBuffersInLocal`, `numBuffersInRemote`<br>
+  Similar to `numBytes…` but counting the number of network buffers.
+
+<div class="alert alert-warning" markdown="1">
+<span class="label label-warning" style="display: inline-block"><span class="glyphicon glyphicon-warning-sign" aria-hidden="true"></span> Warning</span>
+For the sake of completeness and since they have been used in the past, we will briefly look at the `outputQueueLength` and `inputQueueLength` metrics. These are somewhat similar to the `[out,in]PoolUsage` metrics but show the number of buffers sitting in a sender subtask’s output queues and in a receiver subtask’s input queues, respectively. Reasoning about absolute numbers of buffers, however, is difficult and there is also a special subtlety with local channels: since a local input ch [...]
+
+Overall, **we discourage the use of** `outputQueueLength` **and** `inputQueueLength` because their interpretation highly depends on the current parallelism of the operator and the configured numbers of exclusive and floating buffers. Instead, we recommend using the various `*PoolUsage` metrics which even reveal more detailed insight.
+</div>
+
+
+<div class="alert alert-info" markdown="1">
+<span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+ If you reason about buffer usage, please keep the following in mind:
+
+* Any outgoing channel which has been used at least once will always occupy one buffer (since Flink 1.5).
+  * **<span style="color:orange">up to Flink 1.8:</span>** This buffer (even if empty!) was always counted as a backlog of 1 and thus receivers tried to reserve a floating buffer for it.
+  * **<span style="color:green">Flink 1.9 and above:</span>** A buffer is only counted in the backlog if it is ready for consumption, i.e. it is full or was flushed (see FLINK-11082)
+* The receiver will only release a received buffer after deserialising the last record in it.
+</div>
+
+The following sections make use of and combine these metrics to reason about backpressure and resource usage / efficiency with respect to throughput. A separate section will detail latency related metrics.
+
+
+### Backpressure
+
+Backpressure may be indicated by two different sets of metrics: (local) buffer pool usages as well as input/output queue lengths. They provide a different level of granularity but, unfortunately, none of these are exhaustive and there is room for interpretation. Because of the inherent problems with interpreting these queue lengths we will focus on the usage of input and output pools below which also provides more detail.
+
+* **If a subtask’s** `outPoolUsage` **is 100%**, it is backpressured. Whether the subtask is already blocking or still writing records into network buffers depends on how full the buffers are, that the `RecordWriters` are currently writing into.<br>
+<span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;"></span> This is different to what the backpressure monitor is showing!
+
+* An `inPoolUsage` of 100% means that all floating buffers are assigned to channels and eventually backpressure will be exercised upstream. These floating buffers are in either of the following conditions: they are reserved for future use on a channel due to an exclusive buffer being utilised (remote input channels always try to maintain `#exclusive buffers` credits), they are reserved for a sender’s backlog and wait for data, they may contain data and are enqueued in an input channel, o [...]
+
+* **<span style="color:orange">up to Flink 1.8:</span>** Due to [FLINK-11082](https://issues.apache.org/jira/browse/FLINK-11082), an `inPoolUsage` of 100% is quite common even in normal situations.
+
+* **<span style="color:green">Flink 1.9 and above:</span>** If `inPoolUsage` is constantly around 100%, this is a strong indicator for exercising backpressure upstream.
+
+The following table summarises all combinations and their interpretation. Bear in mind, though, that backpressure may be minor or temporary (no need to look into it), on particular channels only, or caused by other JVM processes on a particular TaskManager, such as GC, synchronisation, I/O, resource shortage, instead of a specific subtask.
+
+<center>
+<table class="tg">
+  <tr>
+    <th></th>
+    <th class="tg-center"><code>outPoolUsage</code> low</th>
+    <th class="tg-center"><code>outPoolUsage</code> high</th>
+  </tr>
+  <tr>
+    <th class="tg-top"><code>inPoolUsage</code> low</th>
+    <td class="tg-topcenter">
+      <span class="glyphicon glyphicon-ok-sign" aria-hidden="true" style="color:green;font-size:1.5em;"></span></td>
+    <td class="tg-topcenter">
+      <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br>
+      (backpressured, temporary situation: upstream is not backpressured yet or not anymore)</td>
+  </tr>
+  <tr>
+    <th class="tg-top" rowspan="2">
+      <code>inPoolUsage</code> high<br>
+      (<strong><span style="color:green">Flink 1.9+</span></strong>)</th>
+    <td class="tg-topcenter">
+      if all upstream tasks’<code>outPoolUsage</code> are low: <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br>
+      (may eventually cause backpressure)</td>
+    <td class="tg-topcenter" rowspan="2">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br>
+      (backpressured by downstream task(s) or network, probably forwarding backpressure upstream)</td>
+  </tr>
+  <tr>
+    <td class="tg-topcenter">if any upstream task’s<code>outPoolUsage</code> is high: <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br>
+      (may exercise backpressure upstream and may be the source of backpressure)</td>
+  </tr>
+</table>
+</center>
+
+<br>
+We may even reason more about the cause of backpressure by looking at the network metrics of the subtasks of two consecutive tasks:
+
+* If all subtasks of the receiver task have low `inPoolUsage` values and any upstream subtask’s `outPoolUsage` is high, then there may be a network bottleneck causing backpressure.
+Since network is a shared resource among all subtasks of a TaskManager, this may not directly originate from this subtask, but rather from various concurrent operations, e.g. checkpoints, other streams, external connections, or other TaskManagers/processes on the same machine.
+
+Backpressure can also be caused by all parallel instances of a task or by a single task instance. The first usually happens because the task is performing some time consuming operation that applies to all input partitions. The latter is usually the result of some kind of skew, either data skew or resource availability/allocation skew. In either case, you can find some hints on how to handle such situations in the [What to do with backpressure?](#span-classlabel-label-info-styledisplay-in [...]
+
+<div class="alert alert-info" markdown="1">
+### <span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Flink 1.9 and above
+{:.no_toc}
+
+* If `floatingBuffersUsage` is not 100%, it is unlikely that there is backpressure. If it is 100% and any upstream task is backpressured, it suggests that this input is exercising backpressure on either a single, some or all input channels. To differentiate between those three situations you can use `exclusiveBuffersUsage`:
+  * Assuming that `floatingBuffersUsage` is around 100%, the higher the `exclusiveBuffersUsage` the more input channels are backpressured. In an extreme case of `exclusiveBuffersUsage` being close to 100%, it means that all channels are backpressured.
+
+<br>
+The relation between `exclusiveBuffersUsage`, `floatingBuffersUsage`, and the upstream tasks' `outPoolUsage` is summarised in the following table and extends on the table above with `inPoolUsage = floatingBuffersUsage + exclusiveBuffersUsage`:
+
+<center>
+<table class="tg">
+  <tr>
+    <th></th>
+    <th><code>exclusiveBuffersUsage</code> low</th>
+    <th><code>exclusiveBuffersUsage</code> high</th>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> low +<br>
+      <em>all</em> upstream <code>outPoolUsage</code> low</th>
+    <td class="tg-center"><span class="glyphicon glyphicon-ok-sign" aria-hidden="true" style="color:green;font-size:1.5em;"></span></td>
+    <td class="tg-center">-<sup>3</sup></td>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> low +<br>
+      <em>any</em> upstream <code>outPoolUsage</code> high</th>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br>
+      (potential network bottleneck)</td>
+    <td class="tg-center">-<sup>3</sup></td>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> high +<br>
+      <em>all</em> upstream <code>outPoolUsage</code> low</th>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br>
+      (backpressure eventually appears on only some of the input channels)</td>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-warning-sign" aria-hidden="true" style="color:orange;font-size:1.5em;"></span><br>
+      (backpressure eventually appears on most or all of the input channels)</td>
+  </tr>
+  <tr>
+    <th class="tg-top" style="min-width:33%;">
+      <code>floatingBuffersUsage</code> high +<br>
+      any upstream <code>outPoolUsage</code> high</th>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br>
+      (backpressure on only some of the input channels)</td>
+    <td class="tg-center">
+      <span class="glyphicon glyphicon-remove-sign" aria-hidden="true" style="color:red;font-size:1.5em;"></span><br>
+      (backpressure on most or all of the input channels)</td>
+  </tr>
+</table>
+</center>
+
+<sup>3</sup> this should not happen
+
+</div>
+
+
+### Resource Usage / Throughput
+
+Besides the obvious use of each individual metric mentioned above, there are also a few combinations providing useful insight into what is happening in the network stack:
+
+* Low throughput with frequent `outPoolUsage` values around 100% but low `inPoolUsage` on all receivers is an indicator that the round-trip-time of our credit-notification (depends on your network’s latency) is too high for the default number of exclusive buffers to make use of your bandwidth. Consider increasing the [buffers-per-channel]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel) parameter or try disabling credit-based  [...]
+
+* Combining `numRecordsOut` and `numBytesOut` helps identifying average serialised record sizes which supports you in capacity planning for peak scenarios.
+
+* If you want to reason about buffer fill rates and the influence of the output flusher, you may combine `numBytesInRemote` with `numBuffersInRemote`. When tuning for throughput (and not latency!), low buffer fill rates may indicate reduced network efficiency. In such cases, consider increasing the buffer timeout.
+Please note that, as of Flink 1.8 and 1.9, `numBuffersOut` only increases for buffers getting full or for an event cutting off a buffer (e.g. a checkpoint barrier) and may lag behind. Please also note that reasoning about buffer fill rates on local channels is unnecessary since buffering is an optimisation technique for remote channels with limited effect on local channels.
+
+* You may also separate local from remote traffic using numBytesInLocal and numBytesInRemote but in most cases this is unnecessary.
+
+<div class="alert alert-info" markdown="1">
+### <span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> What to do with Backpressure?
+{:.no_toc}
+
+Assuming that you identified where the source of backpressure — a bottleneck — is located, the next step is to analyse why this is happening. Below, we list some potential causes of backpressure from the more basic to the more complex ones. We recommend to check the basic causes first, before diving deeper on the more complex ones and potentially drawing false conclusions.
+
+Please also recall that backpressure might be temporary and the result of a load spike, checkpointing, or a job restart with a data backlog waiting to be processed. If backpressure is temporary, you should simply ignore it. Alternatively, keep in mind that the process of analysing and solving the issue can be affected by the intermittent nature of your bottleneck. Having said that, here are a couple of things to check.
+
+#### System Resources
+
+Firstly, you should check the incriminated machines’ basic resource usage like CPU, network, or disk I/O. If some resource is fully or heavily utilised you can do one of the following:
+
+1. Try to optimise your code. Code profilers are helpful in this case.
+2. Tune Flink for that specific resource.
+3. Scale out by increasing the parallelism and/or increasing the number of machines in the cluster.
+
+#### Garbage Collection
+
+Oftentimes, performance issues arise from long GC pauses. You can verify whether you are in such a situation by either printing debug GC logs (via -`XX:+PrintGCDetails`) or by using some memory/GC profilers. Since dealing with GC issues is highly application-dependent and independent of Flink, we will not go into details here ([Oracle's Garbage Collection Tuning Guide](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html) or [Plumbr’s Java Garbage Collection hand [...]
+
+#### CPU/Thread Bottleneck
+
+Sometimes a CPU bottleneck might not be visible at first glance if one or a couple of threads are causing the CPU bottleneck while the CPU usage of the overall machine remains relatively low. For instance, a single CPU-bottlenecked thread on a 48-core machine would result in only 2% CPU use. Consider using code profilers for this as they can identify hot threads by showing each threads' CPU usage, for example.
+
+#### Thread Contention
+
+Similarly to the CPU/thread bottleneck issue above, a subtask may be bottlenecked due to high thread contention on shared resources. Again, CPU profilers are your best friend here! Consider looking for synchronisation overhead / lock contention in user code — although adding synchronisation in user code should be avoided and may even be dangerous! Also consider investigating shared system resources. The default JVM’s SSL implementation, for example, can become contented around the shared [...]
+
+#### Load Imbalance
+
+If your bottleneck is caused by data skew, you can try to remove it or mitigate its impact by changing the data partitioning to separate heavy keys or by implementing local/pre-aggregation.
+
+<br>
+This list is far from exhaustive. Generally, in order to reduce a bottleneck and thus backpressure, first analyse where it is happening and then find out why. The best place to start reasoning about the “why” is by checking what resources are fully utilised.
+</div>
+
+### Latency Tracking
+
+Tracking latencies at the various locations they may occur is a topic of its own. In this section, we will focus on the time records wait inside Flink’s network stack — including the system’s network connections. In low throughput scenarios, these latencies are influenced directly by the output flusher via the buffer timeout parameter or indirectly by any application code latencies. When processing a record takes longer than expected or when (multiple) timers fire at the same time — and  [...]
+
+Flink offers some support for [tracking the latency]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/monitoring/metrics.html#latency-tracking) of records passing through the system (outside of user code). However, this is disabled by default (see below why!) and must be enabled by setting a latency tracking interval either in Flink’s [configuration via `metrics.latency.interval`]({{ site.DOCS_BASE_URL }}flink-docs-release-1.8/ops/config.html#metrics-latency-interval) or via [ExecutionConf [...]
+
+* `single`: one histogram for each operator subtask
+* `operator` (default): one histogram for each combination of source task and operator subtask
+* `subtask`: one histogram for each combination of source subtask and operator subtask (quadratic in the parallelism!)
+
+These metrics are collected through special “latency markers”: each source subtask will periodically emit a special record containing the timestamp of its creation. The latency markers then flow alongside normal records while not overtaking them on the wire or inside a buffer queue. However, _a latency marker does not enter application logic_ and is overtaking records there. Latency markers therefore only measure the waiting time between the user code and not a full “end-to-end” latency. [...]
+
+Since `LatencyMarkers` sit in network buffers just like normal records, they will also wait for the buffer to be full or flushed due to buffer timeouts. When a channel is on high load, there is no added latency by the network buffering data. However, as soon as one channel is under low load, records and latency markers will experience an expected average delay of at most `buffer_timeout / 2`. This delay will add to each network connection towards a subtask and should be taken into accoun [...]
+
+By looking at the exposed latency tracking metrics for each subtask, for example at the 95th percentile, you should nevertheless be able to identify subtasks which are adding substantially to the overall source-to-sink latency and continue with optimising there.
+
+<div class="alert alert-info" markdown="1">
+<span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+Flink's latency markers assume that the clocks on all machines in the cluster are in sync. We recommend setting up an automated clock synchronisation service (like NTP) to avoid false latency results.
+</div>
+
+<div class="alert alert-warning" markdown="1">
+<span class="label label-warning" style="display: inline-block"><span class="glyphicon glyphicon-warning-sign" aria-hidden="true"></span> Warning</span>
+Enabling latency metrics can significantly impact the performance of the cluster (in particular for `subtask` granularity) due to the sheer amount of metrics being added as well as the use of histograms which are quite expensive to maintain. It is highly recommended to only use them for debugging purposes.
+</div>
+
+
+## Conclusion
+
+In the previous sections we discussed how to monitor Flink's network stack which primarily involves identifying backpressure: where it occurs, where it originates from, and (potentially) why it occurs. This can be executed in two ways: for simple cases and debugging sessions by using the backpressure monitor; for continuous monitoring, more in-depth analysis, and less runtime overhead by using Flink’s task and network stack metrics. Backpressure can be caused by the network layer itself  [...]
+
+Stay tuned for the third blog post in the series of network stack posts that will focus on tuning techniques and anti-patterns to avoid.
+
+
diff --git a/css/flink.css b/css/flink.css
index 9f13341..5c0f621 100755
--- a/css/flink.css
+++ b/css/flink.css
@@ -87,6 +87,11 @@ h1, h2, h3, h4, h5, h6 {
     margin-top: -60px;
 }
 
+/* fix conflict with bootstrap's alert */
+.alert h4 {
+	margin-top: -60px;
+}
+
 h1 {
 	font-size: 160%;
 }
diff --git a/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png b/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png
new file mode 100644
index 0000000..15372fd
Binary files /dev/null and b/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png differ