You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@helix.apache.org by DImuthu Upeksha <di...@gmail.com> on 2018/04/06 19:12:57 UTC

Sporadic issue when restarting a Participant

Hi Folks,

I used helix task framework to run several workflows and restarted the
participant that held the implemented tasks. Most of the cases in restarts,
Helix catches up and continue with last Task but In some cases it prints
following error on Controller log and Workflow stops working upon that
point. What could be the reason for that? I'm using Helix 0.6.7 version.

2018-04-06 15:10:57,766 [Thread-3] ERROR
o.a.h.c.s.BestPossibleStateCalcStage  - Error computing assignment for
resource
Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
Skipping.
java.lang.NullPointerException: Name is null
        at java.lang.Enum.valueOf(Enum.java:236)
        at
org.apache.helix.task.TaskPartitionState.valueOf(TaskPartitionState.java:25)
        at
org.apache.helix.task.JobRebalancer.computeResourceMapping(JobRebalancer.java:272)
        at
org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(JobRebalancer.java:140)
        at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:171)
        at
org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:66)
        at
org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
        at
org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:295)
        at
org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
2018-04-06 15:11:00,385 [Thread-3] ERROR
o.a.h.c.s.BestPossibleStateCalcStage  - Error computing assignment for
resource
Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.
Skipping.

Thanks
Dimuthu

Re: Sporadic issue when restarting a Participant

Posted by Xue Junkai <ju...@gmail.com>.

Hi DImuthu,

This could caused by the race condition of restarting participants. When a
task assigned to the participant, it will initialize the task and update
the TaskContext in JobContext by participant. At that moment, the
participant got restarted. When the participant comes back, controller will
reassign the task to the participant. During this process, all the
TaskContexts of the job will be readback and parsed. So that maybe where
the NPE comes from.

I will create a ticket for this NULL check. Thanks for reporting this.

Best,

Junkai



On Fri, Apr 6, 2018 at 12:12 PM, DImuthu Upeksha <dimuthu.upeksha2@gmail.com
> wrote:

> Hi Folks,
>
> I used helix task framework to run several workflows and restarted the
> participant that held the implemented tasks. Most of the cases in restarts,
> Helix catches up and continue with last Task but In some cases it prints
> following error on Controller log and Workflow stops working upon that
> point. What could be the reason for that? I'm using Helix 0.6.7 version.
>
> 2018-04-06 15:10:57,766 [Thread-3] ERROR
> o.a.h.c.s.BestPossibleStateCalcStage  - Error computing assignment for
> resource
> Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-
> 59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_
> TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
> Skipping.
> java.lang.NullPointerException: Name is null
>         at java.lang.Enum.valueOf(Enum.java:236)
>         at
> org.apache.helix.task.TaskPartitionState.valueOf(
> TaskPartitionState.java:25)
>         at
> org.apache.helix.task.JobRebalancer.computeResourceMapping(
> JobRebalancer.java:272)
>         at
> org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionSt
> ate(JobRebalancer.java:140)
>         at
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(
> BestPossibleStateCalcStage.java:171)
>         at
> org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(
> BestPossibleStateCalcStage.java:66)
>         at
> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
>         at
> org.apache.helix.controller.GenericHelixController.handleEvent(
> GenericHelixController.java:295)
>         at
> org.apache.helix.controller.GenericHelixController$
> ClusterEventProcessor.run(GenericHelixController.java:595)
> 2018-04-06 15:11:00,385 [Thread-3] ERROR
> o.a.h.c.s.BestPossibleStateCalcStage  - Error computing assignment for
> resource
> Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-
> db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_
> TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.
> Skipping.
>
> Thanks
> Dimuthu
>



-- 
Junkai Xue