You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Shuai Lin (JIRA)" <ji...@apache.org> on 2016/02/04 14:10:39 UTC

[jira] [Commented] (MESOS-4585) mesos-fetcher LIBPROCESS_PORT set to 5051 URI fetch failure

    [ https://issues.apache.org/jira/browse/MESOS-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132275#comment-15132275 ] 

Shuai Lin commented on MESOS-4585:
----------------------------------

Confirmed this bug is introduced in 0.27: 

OS: ubuntu 14.04 64bit vm inside virtualbox

* Install mesos 0.25 and marathon  0.8.2, and download hadoop 2.6.3
* start zookeeper, mesos master, slave, marathon. Start hdfs name node and data node, copy one local file into hdfs for later use
* create a marathon task with a "hdfs://" uri, the task launches ok
* upgrade mesos to 0.26, kill the task on marathon web ui, and the task relaunches ok
* upgrade mesos to 0.27, again, kill the task on marathon web ui, the task failed with stderr: 

{code}
F0204 13:05:30.878703 30497 process.cpp:892] Failed to initialize: Failed to bind on 0.0.0.0:5051: Address already in use: Address already in use [98]
{code}

> mesos-fetcher LIBPROCESS_PORT set to 5051 URI fetch failure
> -----------------------------------------------------------
>
>                 Key: MESOS-4585
>                 URL: https://issues.apache.org/jira/browse/MESOS-4585
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.27.0
>            Reporter: Drew Robb
>            Assignee: Shuai Lin
>            Priority: Critical
>              Labels: fetcher
>         Attachments: hdfs-stderr.log
>
>
> When starting a task with a {{s3a://}} URI, the fetcher fails to download the URI, failing when trying to bind to the slave's port 5051. The URI gets successfully downloaded, but the error is fatal. If the URI is changed to {{http://}}. The root cause of this is that apparently the mesos-fetcher process has {{LIBPROCESS_PORT=5051}} in its environment as I was able to find from {{cat "/proc/`pgrep mesos-fetcher`/environ"}}.
> stderr from a failing task:
> {quote}
> I0203 00:11:55.815500  4964 fetcher.cpp:424] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/ede0e5bc-d7ac-4b9a-8d35-b210fa785db0-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"s3a:\/\/strava.mesos\/foo"}}],"sandbox_directory":"\/mnt\/mesos\/slaves\/ede0e5bc-d7ac-4b9a-8d35-b210fa785db0-S0\/frameworks\/fe927665-1516-46cf-94dd-6d2ca84007f1-0000\/executors\/uris-test.bc047306-ca0a-11e5-b742-e2162bf6108e\/runs\/24ebd807-b065-4776-a0bf-84bda4a82f01"}
> I0203 00:11:55.816830  4964 fetcher.cpp:379] Fetching URI 's3a://strava.mesos/foo'
> I0203 00:11:55.816846  4964 fetcher.cpp:250] Fetching directly into the sandbox directory
> I0203 00:11:55.816864  4964 fetcher.cpp:187] Fetching URI 's3a://strava.mesos/foo'
> I0203 00:11:56.191640  4964 fetcher.cpp:109] Downloading resource with Hadoop client from 's3a://strava.mesos/foo' to '/mnt/mesos/slaves/ede0e5bc-d7ac-4b9a-8d35-b210fa785db0-S0/frameworks/fe927665-1516-46cf-94dd-6d2ca84007f1-0000/executors/uris-test.bc047306-ca0a-11e5-b742-e2162bf6108e/runs/24ebd807-b065-4776-a0bf-84bda4a82f01/foo'
> F0203 00:11:56.192503  4964 process.cpp:892] Failed to initialize: Failed to bind on 0.0.0.0:5051: Address already in use: Address already in use [98]
> *** Check failure stack trace: ***
>     @     0x7f229ce50e7d  google::LogMessage::Fail()
>     @     0x7f229ce52c10  google::LogMessage::SendToLog()
>     @     0x7f229ce50a42  google::LogMessage::Flush()
>     @     0x7f229ce50c89  google::LogMessage::~LogMessage()
>     @     0x7f229ce51c32  google::ErrnoLogMessage::~ErrnoLogMessage()
>     @     0x7f229cdf16b9  process::initialize()
>     @     0x7f229cdf2f36  process::ProcessBase::ProcessBase()
>     @     0x7f229ce22875  process::reap()
>     @     0x7f229ce2ced7  process::subprocess()
>     @     0x7f229c50ab7b  HDFS::copyToLocal()
>     @           0x40f03e  download()
>     @           0x40b69f  main
>     @     0x7f229adc8a40  (unknown)
>     @           0x40cf59  _start
> Aborted (core dumped)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)