You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Huanli Wang (Jira)" <ji...@apache.org> on 2023/02/24 18:52:00 UTC

[jira] [Updated] (SPARK-42565) Error log improvement for RocksDB state store instance lock acquisition

     [ https://issues.apache.org/jira/browse/SPARK-42565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Huanli Wang updated SPARK-42565:
--------------------------------
    Description: 
 
{code:java}
"23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
"23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms.{code}
 


We are seeing those error messages for a testing query. The *taskId != partitionId* but we fail to be clear on this in the error log.

It's confusing when we see those logs: the second log entry seems to talk about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`.

 

Also, it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired)

  was:
"23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
"23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms.
We are seeing those error messages for a testing query. The `taskId != partitionId` but we fail to clarify this in the error log.

It's confusing when we see those logs: the second log entry seems to talk about `task 3.0` (it's actually partition 3 and retry attempt 0), but the `TID 363` is already occupied by `task 2.0 in stage 57.1`.

 

Also it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired)


> Error log improvement for RocksDB state store instance lock acquisition
> -----------------------------------------------------------------------
>
>                 Key: SPARK-42565
>                 URL: https://issues.apache.org/jira/browse/SPARK-42565
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.5.0
>            Reporter: Huanli Wang
>            Priority: Minor
>
>  
> {code:java}
> "23/02/23 23:57:44 INFO Executor: Running task 2.0 in stage 57.1 (TID 363)
> "23/02/23 23:58:44 ERROR RocksDB StateStoreId(opId=0,partId=3,name=default): RocksDB instance could not be acquired by [ThreadId: Some(49), task: 3.0 in stage 57, TID 363] as it was not released by [ThreadId: Some(51), task: 3.1 in stage 57, TID 342] after 60002 ms.{code}
>  
> We are seeing those error messages for a testing query. The *taskId != partitionId* but we fail to be clear on this in the error log.
> It's confusing when we see those logs: the second log entry seems to talk about `{*}task 3.0{*}` (it's actually partition 3 and retry attempt 0), but the `{*}TID 363{*}` is already occupied by `{*}task 2.0 in stage 57.1{*}`.
>  
> Also, it's unclear at which stage retry attempt, the lock is acquired (or fails to be acquired)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org