You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by Rogan Carr <ro...@gmail.com> on 2017/04/15 01:07:24 UTC

[REEF-1778] The IIMRUResultHandler Dispose() method isn't allowed to complete before the job finishes.

Hi All,

I have run into an interesting issue in the IMRU API where there seems to
be a race condition at the end of the job between the Dispose() method in
the IIMRUResultHandler (and therefore in the UpdateTaskHost) and whatever
signals that the job should end. I have created [REEF-1778] "The
IIMRUResultHandler Dispose() method isn't allowed to complete before the
job finishes" to document this issue, including logs that clearly show that
Dispose() isn't done when the job ends.

My hunch is that the driver is calling Dispose() on the UpdateTaskHost()
and then telling YARN that it is finished, and then YARN kills the
container.

Does anybody have any thoughts on what the issue might be?

Thanks for your help!

Best,
Rogan

[1] https://issues.apache.org/jira/browse/REEF-1778

Re: [REEF-1778] The IIMRUResultHandler Dispose() method isn't allowed to complete before the job finishes.

Posted by Markus Weimer <ma...@weimo.de>.
Thanks so much for reporting this in detail! This looks like a bug in
REEF, not even IMRU to me. I have posted some thoughts on the JIRA as
to what the root cause is and how to address it.

Markus

On Fri, Apr 14, 2017 at 6:07 PM, Rogan Carr <ro...@gmail.com> wrote:
> Hi All,
>
> I have run into an interesting issue in the IMRU API where there seems to
> be a race condition at the end of the job between the Dispose() method in
> the IIMRUResultHandler (and therefore in the UpdateTaskHost) and whatever
> signals that the job should end. I have created [REEF-1778] "The
> IIMRUResultHandler Dispose() method isn't allowed to complete before the
> job finishes" to document this issue, including logs that clearly show that
> Dispose() isn't done when the job ends.
>
> My hunch is that the driver is calling Dispose() on the UpdateTaskHost()
> and then telling YARN that it is finished, and then YARN kills the
> container.
>
> Does anybody have any thoughts on what the issue might be?
>
> Thanks for your help!
>
> Best,
> Rogan
>
> [1] https://issues.apache.org/jira/browse/REEF-1778