You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "Sarjeet Singh (JIRA)" <ji...@apache.org> on 2015/09/15 03:23:45 UTC

[jira] [Created] (MYRIAD-137) Resources offered by mesos are blocked with Myriad FWK on NullPointerException and FlexDown FGS NM.

Sarjeet Singh created MYRIAD-137:
------------------------------------

             Summary: Resources offered by mesos are blocked with Myriad FWK on NullPointerException and FlexDown FGS NM.
                 Key: MYRIAD-137
                 URL: https://issues.apache.org/jira/browse/MYRIAD-137
             Project: Myriad
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: Myriad 0.1.0
            Reporter: Sarjeet Singh


Observed this issue on 2 instances when I did a flex down of FGS NM & On another instance, this happened when NullPointerException occurred (JIRA Myriad-135).

>From Mesos UI, observed that no resources are left to offer, when there was no utilization happening in the cluster, except 3 NMs (2 MP, 1 ZP).

On debugging RM logs, found the NullPointerException which caused the OfferEventHandler thread to exit and no more offers from mesos to myriad after that.

Then, I tried restarting RM again, and resources are back to mesos again :)

Then, I tried running few mapreduce jobs and observed the issue with Flexing down FGS NM which caused the whole resources offered to myriad to block completely and myriad didn't release any resources after that.

So, it seems that Flexing down NMs procedure only cleanup the active containers & NM itself, but doesn't clean up outstanding offers incase offers are saved to OfferLifeCycle for future task by FGS NMs. 

Resources (From mesos-master UI)
=========

CPUs    Mem
Total    84    253.9 GB
Used    3.300    6.1 GB
Offered    80.700    247.8 GB
Idle    -1.4210854715202004e-14    0 B    <------- No Resources available.

Here is the active Offers (*blocked*) shown on mesos UI for offers:

Offers
=====

ID    Framework    Host    CPUs    Mem
…5050-3270-O4151    MyriadAlpha    node101-116    0.5    64 MB
…5050-3270-O4149    MyriadAlpha    node101-116    0.200    282 MB
…5050-3270-O4147    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4145    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4143    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4141    MyriadAlpha    node101-116    1    1.0 GB
…5050-3270-O4139    MyriadAlpha    node101-117    24.5    87.8 GB
…5050-3270-O4137    MyriadAlpha    node101-116    22.9    87.4 GB
…5050-3270-O4135    MyriadAlpha    node101-117    3    3.0 GB
…5050-3270-O4134    MyriadAlpha    node101-137    25.6    65.2 GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)