You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by "Hunter L (JIRA)" <ji...@apache.org> on 2018/11/01 00:21:00 UTC
[jira] [Created] (HELIX-778) TASK: Fix a race condition in
updatePreviousAssignedTasksStatus
Hunter L created HELIX-778:
------------------------------
Summary: TASK: Fix a race condition in updatePreviousAssignedTasksStatus
Key: HELIX-778
URL: https://issues.apache.org/jira/browse/HELIX-778
Project: Apache Helix
Issue Type: Improvement
Reporter: Hunter L
Assignee: Hunter L
It was observed that TestUnregisteredCommand is very unstable. The reason was identified to be a race condition where when a task fails, sometimes a pending message for that task (from INIT to RUNNING) wasn't being cleaned up on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would try to process that message and skip the status update of that task (like updating its status and NUM_ATTEMPTS field in JobContext).
A short, temporary fix is to call markPartitionError() prior to checking the pending message, but over the long haul, we would need to revisit the task status update's design here to avoid this type of race conditions.
Changelist:
1. Move markPartitionError() up before checking for a pending message on the task
2. Fix TestUnregisteredCommand's instability
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)