You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2018/08/31 16:47:00 UTC

[jira] [Comment Edited] (MESOS-9172) Fetcher deadlock with duplicated URIs.

    [ https://issues.apache.org/jira/browse/MESOS-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598998#comment-16598998 ] 

James Peach edited comment on MESOS-9172 at 8/31/18 4:46 PM:
-------------------------------------------------------------

| [r/68587|https://reviews.apache.org/r/68587] | Fixed fetcher deadlock with duplicate URIs. |
| [r/68586|https://reviews.apache.org/r/68586] | Add the output file to the hash on CommandInfo::URI. |


was (Author: jamespeach):
| [r/68587|https://reviews.apache.org/*r/68587] | Fixed fetcher deadlock with duplicate URIs. |
| [r/68586|https://reviews.apache.org/*r/68586] | Add the output file to the hash on CommandInfo::URI. |

> Fetcher deadlock with duplicated URIs.
> --------------------------------------
>
>                 Key: MESOS-9172
>                 URL: https://issues.apache.org/jira/browse/MESOS-9172
>             Project: Mesos
>          Issue Type: Bug
>          Components: fetcher
>            Reporter: James Peach
>            Assignee: James Peach
>            Priority: Major
>
> If the fetcher cache is empty and you launch a task that contains duplicate URIs, the fetcher deadlocks waiting for the futures in {{FetcherProcess::_fetch}}.
> What happens is that when the fetcher is setting up the initial match of cache lookup futures in {{FetcherProcess::fetch}}, the duplicate URIs cause cache hits on the placeholder cache entries. This code is assuming that there is already an operation in flight that will populate the cache entry. However, the cache is currently empty - the placeholder entry is caused by a the duplicate in the task's URIs.
> When we await the futures in {{FetcherProcess::_fetch}}, we end up waiting for the future that indicated the cache entry becomes populated, but that won't ever happen because we need to make progress on the current fetching batch in order to populate the cache entry. At this point we are live-locked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)