You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/11 03:01:35 UTC

[GitHub] [spark] Myasuka commented on a change in pull request #35480: [SPARK-38178][SS] Correct the logic to measure the memory usage of RocksDB

Myasuka commented on a change in pull request #35480:
URL: https://github.com/apache/spark/pull/35480#discussion_r804329368



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
##########
@@ -403,7 +404,7 @@ class RocksDB(
     RocksDBMetrics(
       numKeysOnLoadedVersion,
       numKeysOnWritingVersion,
-      readerMemUsage + memTableMemUsage,
+      readerMemUsage + memTableMemUsage + blockCacheUsage,

Review comment:
       > based on the doc, do we also need rocksdb.block-cache-pinned-usage?
   
   No, I don't think so, and I actually write my conclusion in original ticket SPARK-38178:
   `BTW, as the "block-cache-pinned-usage" is included in "block-cache-usage", we don't need to include the pinned usage.`
   
   Let's take a look at the RocksDB implementation:
   
   - [how methods handle the property query:](https://github.com/facebook/rocksdb/blob/073ac547391870f464fae324a19a6bc6a70188dc/db/internal_stats.cc#L545-L550)
   ~~~c++
     {DB::Properties::kBlockCacheUsage,
      {false, nullptr, &InternalStats::HandleBlockCacheUsage, nullptr,
       nullptr}},
     {DB::Properties::kBlockCachePinnedUsage,
      {false, nullptr, &InternalStats::HandleBlockCachePinnedUsage, nullptr,
       nullptr}},
   ~~~
   - And what will call for these two [handle methods](https://github.com/facebook/rocksdb/blob/073ac547391870f464fae324a19a6bc6a70188dc/db/internal_stats.cc#L1304-L1324):
   ~~~ c++
   bool InternalStats::HandleBlockCacheUsage(uint64_t* value, DBImpl* /*db*/,
                                             Version* /*version*/) {
     Cache* block_cache;
     bool ok = GetBlockCacheForStats(&block_cache);
     if (!ok) {
       return false;
     }
     *value = static_cast<uint64_t>(block_cache->GetUsage());
     return true;
   }
   
   bool InternalStats::HandleBlockCachePinnedUsage(uint64_t* value, DBImpl* /*db*/,
                                                   Version* /*version*/) {
     Cache* block_cache;
     bool ok = GetBlockCacheForStats(&block_cache);
     if (!ok) {
       return false;
     }
     *value = static_cast<uint64_t>(block_cache->GetPinnedUsage());
     return true;
   }
   ~~~
   
   - And then [the implementation](https://github.com/facebook/rocksdb/blob/073ac547391870f464fae324a19a6bc6a70188dc/cache/lru_cache.cc#L625-L634) of these two getters:
   ~~~ c++
   size_t LRUCacheShard::GetUsage() const {
     MutexLock l(&mutex_);
     return usage_;
   }
   
   size_t LRUCacheShard::GetPinnedUsage() const {
     MutexLock l(&mutex_);
     assert(usage_ >= lru_usage_);
     return usage_ - lru_usage_;
   }
   ~~~
   
   As you can see, the pinned usage is included in the block cache usage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org