You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/28 15:03:24 UTC
[GitHub] [flink] pnowojski commented on a change in pull request #18754: [FLINK-26130][docs] Expanded the reason to increase network buffer si…

pnowojski commented on a change in pull request #18754:
URL: https://github.com/apache/flink/pull/18754#discussion_r836480725



##########
File path: docs/content/docs/deployment/memory/network_mem_tuning.md
##########
@@ -128,7 +128,13 @@ The default settings for exclusive buffers and floating buffers should be suffic
 
 The buffer collects records in order to optimize network overhead when sending the data portion to the next subtask. The next subtask should receive all parts of the record before consuming it. 
 
-If the buffer size is too small (i.e. less than one record), this can lead to low throughput since the overhead is still pretty large.  
+If the buffer size is too small (i.e. less than one record), this can lead to low throughput 
+since the per-buffer overhead can be pretty large especially if the data flushing frequency is high.
+(If the data isn't flushed too frequently, the per-buffer overheads shouldn't affect the throughput).

Review comment:
       ```suggestion
   If the buffer size is too small, or the buffers are flushed too frequently (`execution.buffer-timeout` configuration parameter), this can lead to decreased throughput 
   since the per-buffer overhead are significantly higher then per-record overheads in the Flink's runtime.
   ```

##########
File path: docs/content/docs/deployment/memory/network_mem_tuning.md
##########
@@ -128,7 +128,13 @@ The default settings for exclusive buffers and floating buffers should be suffic
 
 The buffer collects records in order to optimize network overhead when sending the data portion to the next subtask. The next subtask should receive all parts of the record before consuming it. 
 
-If the buffer size is too small (i.e. less than one record), this can lead to low throughput since the overhead is still pretty large.  
+If the buffer size is too small (i.e. less than one record), this can lead to low throughput 
+since the per-buffer overhead can be pretty large especially if the data flushing frequency is high.
+(If the data isn't flushed too frequently, the per-buffer overheads shouldn't affect the throughput).
+
+If the network isn't stable and the network bottleneck is observed
+(downstream operator idling, upstream backpressured, output buffer queue is full, downstream input queue is empty),
+only then it makes sense to start looking into increasing the buffer size.

Review comment:
       ```suggestion
   As a rule of thumb, we don't recommend thinking about increasing the buffer size, or the buffer timeout unless you can observe a network bottleneck in your real life workload
   (downstream operator idling, upstream backpressured, output buffer queue is full, downstream input queue is empty).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org