You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/23 07:51:20 UTC

[GitHub] [flink] Myasuka opened a new pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Myasuka opened a new pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930
 
 
   
   ## What is the purpose of the change
   
   Add end-to-end test for controlling RocksDB memory usage. This job has 4 states in 4 different operator, and all the operators are shared in one slot.
   
   **NOTE:** This end-to-end test could be a unstable one when too many unflushed immutable mem-tables. I wrote [a doc to explain how write buffer manager works in RocksDB.](https://docs.google.com/document/d/1_4Brwy2Axzzqu7SJ4hLLl92hVCpeRlVEG-fj8KsTMUo/edit#heading=h.f5wfmsmpemd0) In this doc I explained the most total memory usage could be much higher than expected in the **worst** case.
   
   Below is the general test result:
   1GB TM, 2 slot each without memory control.
   When we do not control memory usage over RocksDB instances, the total memory should be summed as `block-cache-usgae` + `total-mem-table` from all 4 states. As you can see, the total memory usage in one slot could be 400MB+
   <img width="1319" alt="111" src="https://user-images.githubusercontent.com/1709104/72965411-31cdaa80-3df7-11ea-843d-1565d7b7b89d.png">
   
   1GB TM, 2 slot each has 161061276 bytes of managed off-heap memory
   Since we use the same cache to share among all rocksDB instances, the total memory usage is the block cache usage. As you can see, the memory usage could be near the vicinity of 161061276 bytes.
   <img width="1266" alt="image" src="https://user-images.githubusercontent.com/1709104/72965622-ce904800-3df7-11ea-8a04-b818f67929c4.png">
   
   
   
   ## Brief change log
   Add end-to-end test for controlling RocksDB memory usage.
   
   
   ## Verifying this change
   This change added tests and can be verified as follows:
   
     - Added `RocksDBStateMemoryControlTestProgram` to verify end-to-end.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#issuecomment-577573912
 
 
   <!--
   Meta data
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/145716812 TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4568 TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   -->
   ## CI report:
   
   * abfc351bc241d9e654157cad0f3c0a13823789c7 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/145716812) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4568) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#issuecomment-577573912
 
 
   <!--
   Meta data
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:UNKNOWN URL:TBD TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   -->
   ## CI report:
   
   * abfc351bc241d9e654157cad0f3c0a13823789c7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] asfgit closed pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#issuecomment-577573912
 
 
   <!--
   Meta data
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:FAILURE URL:https://travis-ci.com/flink-ci/flink/builds/145716812 TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:SUCCESS URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4568 TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   -->
   ## CI report:
   
   * abfc351bc241d9e654157cad0f3c0a13823789c7 Travis: [FAILURE](https://travis-ci.com/flink-ci/flink/builds/145716812) Azure: [SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4568) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#discussion_r369974476
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/test_rocksdb_state_memory_control.sh
 ##########
 @@ -0,0 +1,105 @@
+#!/usr/bin/env bash
+################################################################################
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#if [ -z $1 ] || [ -z $2 ]; then
+# echo "Usage: ./test_rocksdb_state_memory_control.sh "
+# exit 1
+#fi
+
+source "$(dirname "$0")"/common.sh
+
+PARALLELISM=2
+CHECKPOINT_DIR="$TEST_DATA_DIR/test_rocksdb_state_memory_control-dir"
+mkdir -p $CHECKPOINT_DIR
+CHECKPOINT_DIR_URI="file://$CHECKPOINT_DIR"
+# 161061276 + 4 * 8 * 1024 * 1024 = 194615708 bytes
+EXPECTED_MAX_MEMORY_USAGE=194615708
+
+set_config_key "taskmanager.memory.process.size" "1024m"
+set_config_key "state.backend.rocksdb.memory.managed" "true"
+set_config_key "state.backend.rocksdb.metrics.size-all-mem-tables" "true"
+set_config_key "state.backend.rocksdb.metrics.cur-size-active-mem-table" "true"
+set_config_key "state.backend.rocksdb.metrics.num-immutable-mem-table" "true"
+set_config_key "state.backend.rocksdb.memory.`write`-buffer-ratio" "0.8"
 
 Review comment:
   `write`-buffer-ratio -> write-buffer-ratio, I guess this is a typo?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#issuecomment-577573912
 
 
   <!--
   Meta data
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:PENDING URL:https://travis-ci.com/flink-ci/flink/builds/145716812 TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   Hash:abfc351bc241d9e654157cad0f3c0a13823789c7 Status:PENDING URL:https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4568 TriggerType:PUSH TriggerID:abfc351bc241d9e654157cad0f3c0a13823789c7
   -->
   ## CI report:
   
   * abfc351bc241d9e654157cad0f3c0a13823789c7 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/145716812) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4568) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] carp84 commented on a change in pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
carp84 commented on a change in pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#discussion_r369976311
 
 

 ##########
 File path: flink-end-to-end-tests/run-nightly-tests.sh
 ##########
 @@ -171,6 +171,8 @@ run_test "ConnectedComponents iterations with high parallelism end-to-end test"
 
 run_test "Dependency shading of table modules test" "$END_TO_END_DIR/test-scripts/test_table_shaded_dependencies.sh"
 
+run_test "RocksDB memory control end-to-end test" "$END_TO_END_DIR/test-scripts/test_rocksdb_state_memory_control.sh" "skip_check_exceptions"
 
 Review comment:
   Tow comments here:
   1. I guess we take an optimistic policy here that although in theory there's a chance that this test be unstable, we enable it by default?
   2. I believe we also need to add this test into `split_misc.sh` and `split_misc_hadoopfree.sh` under `tools/travis/splits` directory.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#issuecomment-577628302
 
 
   Nice work, @Myasuka and @carp84 .
   
   With Chinese New Year happening now, I can take this over from here and address the remaining comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#issuecomment-577556834
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit abfc351bc241d9e654157cad0f3c0a13823789c7 (Thu Jan 23 07:53:33 UTC 2020)
   
   **Warnings:**
    * **2 pom.xml files were touched**: Check for build and licensing issues.
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] StephanEwen commented on a change in pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage

Posted by GitBox <gi...@apache.org>.
StephanEwen commented on a change in pull request #10930: [FLINK-15368][e2e] Add end-to-end test for controlling RocksDB memory usage
URL: https://github.com/apache/flink/pull/10930#discussion_r370137458
 
 

 ##########
 File path: flink-end-to-end-tests/run-nightly-tests.sh
 ##########
 @@ -171,6 +171,8 @@ run_test "ConnectedComponents iterations with high parallelism end-to-end test"
 
 run_test "Dependency shading of table modules test" "$END_TO_END_DIR/test-scripts/test_table_shaded_dependencies.sh"
 
+run_test "RocksDB memory control end-to-end test" "$END_TO_END_DIR/test-scripts/test_rocksdb_state_memory_control.sh" "skip_check_exceptions"
 
 Review comment:
   Will add it to `split_checkpoints.sh` for better balancing across the profiles.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services