You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2017/02/03 04:35:51 UTC

[jira] [Updated] (HIVE-11687) TaskExecutorService can reject work even if capacity is available

     [ https://issues.apache.org/jira/browse/HIVE-11687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated HIVE-11687:
----------------------------------
    Attachment: HIVE-11687.WIP.txt

[~rajesh.balamohan] reported lots of KILLED fragments for a non-concurrent run. (Double the number of fragments at times)

This is for the reported case, as well as another race where fragment completions reported to the AM can cause the AM to schedule another fragment on the same node before the thread running the previous fragment falls off.

WIP patch. Will add a few tests and try getting some numbers on the delays in reporting to the AM and the executor actually becoming available.

Tested for non-concurrent jobs.

[~prasanth_j], [~rajesh.balamohan] - could you please take a look.

> TaskExecutorService can reject work even if capacity is available
> -----------------------------------------------------------------
>
>                 Key: HIVE-11687
>                 URL: https://issues.apache.org/jira/browse/HIVE-11687
>             Project: Hive
>          Issue Type: Sub-task
>          Components: llap
>    Affects Versions: llap
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>             Fix For: llap
>
>         Attachments: HIVE-11687.WIP.txt
>
>
> The waitQueue has a fixed capacity - which is the wait queue size. Addition of new work doe snot factor in the capacity available to execute work. This ends up being left to the race between work getting scheduled for execution and added to the waitQueue.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)