You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ka...@apache.org on 2023/02/24 22:54:46 UTC

[spark] branch master updated: [SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new ac30e9312a8 [SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance
ac30e9312a8 is described below

commit ac30e9312a89ca16c16ef27c9276b2bb21cd7431
Author: Huanli Wang <hu...@databricks.com>
AuthorDate: Sat Feb 25 07:54:18 2023 +0900

    [SPARK-42565][SS] Error log improvement for the lock acquisition of RocksDB state store instance
    
    ```
    "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
    "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance
    could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by
    [ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms.
    ```
    
    We are seeing those error messages for a testing query. The `taskId != partitionId` but we fail to be clear on this in the error log.
    
    It's confusing when we see those logs: the second log entry seems to talk about `task 3.0` (it's actually partition 3 and retry attempt 0), but the `TID 363` is already occupied by `task 2.0 in stage 57.1`.
    
    Also, it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired)
    
    ### What changes were proposed in this pull request?
    * add `partition ` after `task: ` in the log message for clarification
    * add stage attempt to distinguish different stage retries.
    
    ### Why are the changes needed?
    
    improve the log message for a better debuggability
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    only log message change
    
    Closes #40161 from huanliwang-db/rocksdb.
    
    Authored-by: Huanli Wang <hu...@databricks.com>
    Signed-off-by: Jungtaek Lim <ka...@gmail.com>
---
 .../scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
index cab2fe9b90d..32caf8a1bc8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
@@ -710,7 +710,8 @@ case class AcquiredThreadInfo() {
   override def toString(): String = {
     val taskStr = if (tc != null) {
       val taskDetails =
-        s"${tc.partitionId}.${tc.attemptNumber} in stage ${tc.stageId}, TID ${tc.taskAttemptId}"
+        s"partition ${tc.partitionId}.${tc.attemptNumber} in stage " +
+          s"${tc.stageId}.${tc.stageAttemptNumber()}, TID ${tc.taskAttemptId}"
       s", task: $taskDetails"
     } else ""
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org