You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by "Briant, James" <Ja...@thermofisher.com> on 2016/05/10 16:54:10 UTC

Enable s3a for fetcher

I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:

hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
aws-java-sdk-1.7.4.jar
hadoop-aws-2.5.0-cdh5.3.3.jar

What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?

Thanks,
Jamie 

Re: Enable s3a for fetcher

Posted by Ken Sipe <ke...@gmail.com>.
to Josephs point… hdfs and s3 challenges are dcos issues not a mesos issue.     We do however need Mesos to support custom protocols for the fetcher.   At our current pace of releases it sounds not too far away.

ken
> On May 10, 2016, at 2:20 PM, Joseph Wu <jo...@mesosphere.io> wrote:
> 
> Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume you have a hadoop binary and use it (blindly) for certain types of URIs.  If the hadoop binary is not present, the mesos-fetcher will fail to fetch your HDFS or S3 URIs.
> 
> Mesos does not ship/package hadoop, so these URIs are not expected to work out of the box (for plain Mesos distributions).  In all cases, the operator must preconfigure hadoop on each node (similar to how Docker in Mesos works).
> 
> Here's the epic tracking the modularization of the mesos-fetcher (I estimate it'll be done by 0.30):
> https://issues.apache.org/jira/browse/MESOS-3918 <https://issues.apache.org/jira/browse/MESOS-3918>
> 
> ^ Once done, it should be easier to plug in more fetchers, such as one for your use-case.
> 
> On Tue, May 10, 2016 at 11:21 AM, Briant, James <James.Briant@thermofisher.com <ma...@thermofisher.com>> wrote:
> I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.
> 
> If hadoop is gone, does that mean that hfds: URIs don’t work either?
> 
> Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.
> 
> In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?
> 
> Jamie
> 
> From: Cody Maloney <cody@mesosphere.io <ma...@mesosphere.io>>
> Reply-To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
> Date: Tuesday, May 10, 2016 at 10:58 AM
> To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
> Subject: Re: Enable s3a for fetcher
> 
> The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 <https://issues.apache.org/jira/browse/MESOS-2731> for instance). 
> 
> Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.
> 
> Cody
> 
> On Tue, May 10, 2016 at 9:55 AM Briant, James <James.Briant@thermofisher.com <ma...@thermofisher.com>> wrote:
> I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:
> 
> hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
> aws-java-sdk-1.7.4.jar
> hadoop-aws-2.5.0-cdh5.3.3.jar
> 
> What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?
> 
> Thanks,
> Jamie
> 


RE: Enable s3a for fetcher

Posted by Aaron Carey <ac...@ilm.com>.
We'd be very excited to see a pluggable mesos fetcher!


--

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150

________________________________
From: Ken Sipe [kensipe@gmail.com]
Sent: 11 May 2016 08:40
To: user@mesos.apache.org
Subject: Re: Enable s3a for fetcher

Jamie,

I’m in Europe this week… so the timing of my responses are out of sync / delayed.   There are 2 issues to work with here.  The first is having a pluggable mesos fetcher… sounds like that is scheduled for 0.30.   The other is what is available on dcos.  Could you move that discussion to that mailing list?  I will definitely work with you on getting this resolved.

ken
On May 10, 2016, at 3:45 PM, Briant, James <Ja...@thermofisher.com>> wrote:

Ok. Thanks Joseph. I will figure out how to get a more recent hadoop onto my dcos agents then.

Jamie

From: Joseph Wu <jo...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 1:40 PM
To: user <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

I can't speak to what DCOS does or will do (you can ask on the associated mailing list: users@dcos.io<ma...@dcos.io>).

We will be maintaining existing functionality for the fetcher, which means supporting the schemes:
* file
* http, https, ftp, ftps
* hdfs, hftp, s3, s3n  <--  These rely on hadoop.

And we will retain the --hadoop_home agent flag, which you can use to specify the hadoop binary.

Other schemes might work right now, if you hack around with your node setup.  But there's no guarantee that your hack will work between Mesos versions.  In future, we will associate a fetcher plugin for each scheme.  And you will be able to load custom fetcher plugins for additional schemes.
TLDR: no "nerfing" and less hackiness :)

On Tue, May 10, 2016 at 12:58 PM, Briant, James <Ja...@thermofisher.com>> wrote:
This is the mesos latest documentation:

If the requested URI is based on some other protocol, then the fetcher tries to utilise a local Hadoop client and hence supports any protocol supported by the Hadoop client, e.g., HDFS, S3. See the slave configuration documentation<http://mesos.apache.org/documentation/latest/configuration/> for how to configure the slave with a path to the Hadoop client. [emphasis added]

What you are saying is that dcos simply wont install hadoop on agents?

Next question then: will you be nerfing fetcher.cpp, or will I be able to install hadoop on the agents myself, such that mesos will recognize s3a?


From: Joseph Wu <jo...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 12:20 PM
To: user <us...@mesos.apache.org>>

Subject: Re: Enable s3a for fetcher

Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume you have a hadoop binary and use it (blindly) for certain types of URIs.  If the hadoop binary is not present, the mesos-fetcher will fail to fetch your HDFS or S3 URIs.

Mesos does not ship/package hadoop, so these URIs are not expected to work out of the box (for plain Mesos distributions).  In all cases, the operator must preconfigure hadoop on each node (similar to how Docker in Mesos works).

Here's the epic tracking the modularization of the mesos-fetcher (I estimate it'll be done by 0.30):
https://issues.apache.org/jira/browse/MESOS-3918

^ Once done, it should be easier to plug in more fetchers, such as one for your use-case.

On Tue, May 10, 2016 at 11:21 AM, Briant, James <Ja...@thermofisher.com>> wrote:
I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.

If hadoop is gone, does that mean that hfds: URIs don’t work either?

Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.

In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?

Jamie

From: Cody Maloney <co...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 10:58 AM
To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 for instance).

Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.

Cody

On Tue, May 10, 2016 at 9:55 AM Briant, James <Ja...@thermofisher.com>> wrote:
I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:

hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
aws-java-sdk-1.7.4.jar
hadoop-aws-2.5.0-cdh5.3.3.jar

What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?

Thanks,
Jamie




Re: Enable s3a for fetcher

Posted by Ken Sipe <ke...@gmail.com>.
Jamie,

I’m in Europe this week… so the timing of my responses are out of sync / delayed.   There are 2 issues to work with here.  The first is having a pluggable mesos fetcher… sounds like that is scheduled for 0.30.   The other is what is available on dcos.  Could you move that discussion to that mailing list?  I will definitely work with you on getting this resolved.

ken
> On May 10, 2016, at 3:45 PM, Briant, James <Ja...@thermofisher.com> wrote:
> 
> Ok. Thanks Joseph. I will figure out how to get a more recent hadoop onto my dcos agents then.
> 
> Jamie
> 
> From: Joseph Wu <joseph@mesosphere.io <ma...@mesosphere.io>>
> Reply-To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
> Date: Tuesday, May 10, 2016 at 1:40 PM
> To: user <user@mesos.apache.org <ma...@mesos.apache.org>>
> Subject: Re: Enable s3a for fetcher
> 
> I can't speak to what DCOS does or will do (you can ask on the associated mailing list: users@dcos.io <ma...@dcos.io>).
> 
> We will be maintaining existing functionality for the fetcher, which means supporting the schemes:
> * file
> * http, https, ftp, ftps
> * hdfs, hftp, s3, s3n  <--  These rely on hadoop.
> 
> And we will retain the --hadoop_home agent flag, which you can use to specify the hadoop binary.
> 
> Other schemes might work right now, if you hack around with your node setup.  But there's no guarantee that your hack will work between Mesos versions.  In future, we will associate a fetcher plugin for each scheme.  And you will be able to load custom fetcher plugins for additional schemes.
> TLDR: no "nerfing" and less hackiness :)
> 
> On Tue, May 10, 2016 at 12:58 PM, Briant, James <James.Briant@thermofisher.com <ma...@thermofisher.com>> wrote:
>> This is the mesos latest documentation:
>> 
>> If the requested URI is based on some other protocol, then the fetcher tries to utilise a local Hadoop client and hence supports any protocol supported by the Hadoop client, e.g., HDFS, S3. See the slave configuration documentation <http://mesos.apache.org/documentation/latest/configuration/> for how to configure the slave with a path to the Hadoop client. [emphasis added]
>> 
>> What you are saying is that dcos simply wont install hadoop on agents?
>> 
>> Next question then: will you be nerfing fetcher.cpp, or will I be able to install hadoop on the agents myself, such that mesos will recognize s3a?
>> 
>> 
>> From: Joseph Wu <joseph@mesosphere.io <ma...@mesosphere.io>>
>> Reply-To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
>> Date: Tuesday, May 10, 2016 at 12:20 PM
>> To: user <user@mesos.apache.org <ma...@mesos.apache.org>>
>> 
>> Subject: Re: Enable s3a for fetcher
>> 
>> Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume you have a hadoop binary and use it (blindly) for certain types of URIs.  If the hadoop binary is not present, the mesos-fetcher will fail to fetch your HDFS or S3 URIs.
>> 
>> Mesos does not ship/package hadoop, so these URIs are not expected to work out of the box (for plain Mesos distributions).  In all cases, the operator must preconfigure hadoop on each node (similar to how Docker in Mesos works).
>> 
>> Here's the epic tracking the modularization of the mesos-fetcher (I estimate it'll be done by 0.30):
>> https://issues.apache.org/jira/browse/MESOS-3918 <https://issues.apache.org/jira/browse/MESOS-3918>
>> 
>> ^ Once done, it should be easier to plug in more fetchers, such as one for your use-case.
>> 
>> On Tue, May 10, 2016 at 11:21 AM, Briant, James <James.Briant@thermofisher.com <ma...@thermofisher.com>> wrote:
>>> I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.
>>> 
>>> If hadoop is gone, does that mean that hfds: URIs don’t work either?
>>> 
>>> Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.
>>> 
>>> In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?
>>> 
>>> Jamie
>>> 
>>> From: Cody Maloney <cody@mesosphere.io <ma...@mesosphere.io>>
>>> Reply-To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
>>> Date: Tuesday, May 10, 2016 at 10:58 AM
>>> To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
>>> Subject: Re: Enable s3a for fetcher
>>> 
>>> The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 <https://issues.apache.org/jira/browse/MESOS-2731> for instance). 
>>> 
>>> Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.
>>> 
>>> Cody
>>> 
>>> On Tue, May 10, 2016 at 9:55 AM Briant, James <James.Briant@thermofisher.com <ma...@thermofisher.com>> wrote:
>>>> I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:
>>>> 
>>>> hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
>>>> aws-java-sdk-1.7.4.jar
>>>> hadoop-aws-2.5.0-cdh5.3.3.jar
>>>> 
>>>> What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?
>>>> 
>>>> Thanks,
>>>> Jamie
>> 
> 


Re: Enable s3a for fetcher

Posted by "Briant, James" <Ja...@thermofisher.com>.
Ok. Thanks Joseph. I will figure out how to get a more recent hadoop onto my dcos agents then.

Jamie

From: Joseph Wu <jo...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 1:40 PM
To: user <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

I can't speak to what DCOS does or will do (you can ask on the associated mailing list: users@dcos.io<ma...@dcos.io>).

We will be maintaining existing functionality for the fetcher, which means supporting the schemes:
* file
* http, https, ftp, ftps
* hdfs, hftp, s3, s3n  <--  These rely on hadoop.

And we will retain the --hadoop_home agent flag, which you can use to specify the hadoop binary.

Other schemes might work right now, if you hack around with your node setup.  But there's no guarantee that your hack will work between Mesos versions.  In future, we will associate a fetcher plugin for each scheme.  And you will be able to load custom fetcher plugins for additional schemes.
TLDR: no "nerfing" and less hackiness :)

On Tue, May 10, 2016 at 12:58 PM, Briant, James <Ja...@thermofisher.com>> wrote:
This is the mesos latest documentation:

If the requested URI is based on some other protocol, then the fetcher tries to utilise a local Hadoop client and hence supports any protocol supported by the Hadoop client, e.g., HDFS, S3. See the slave configuration documentation<http://mesos.apache.org/documentation/latest/configuration/> for how to configure the slave with a path to the Hadoop client. [emphasis added]

What you are saying is that dcos simply wont install hadoop on agents?

Next question then: will you be nerfing fetcher.cpp, or will I be able to install hadoop on the agents myself, such that mesos will recognize s3a?


From: Joseph Wu <jo...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 12:20 PM
To: user <us...@mesos.apache.org>>

Subject: Re: Enable s3a for fetcher

Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume you have a hadoop binary and use it (blindly) for certain types of URIs.  If the hadoop binary is not present, the mesos-fetcher will fail to fetch your HDFS or S3 URIs.

Mesos does not ship/package hadoop, so these URIs are not expected to work out of the box (for plain Mesos distributions).  In all cases, the operator must preconfigure hadoop on each node (similar to how Docker in Mesos works).

Here's the epic tracking the modularization of the mesos-fetcher (I estimate it'll be done by 0.30):
https://issues.apache.org/jira/browse/MESOS-3918

^ Once done, it should be easier to plug in more fetchers, such as one for your use-case.

On Tue, May 10, 2016 at 11:21 AM, Briant, James <Ja...@thermofisher.com>> wrote:
I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.

If hadoop is gone, does that mean that hfds: URIs don’t work either?

Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.

In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?

Jamie

From: Cody Maloney <co...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 10:58 AM
To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 for instance).

Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.

Cody

On Tue, May 10, 2016 at 9:55 AM Briant, James <Ja...@thermofisher.com>> wrote:
I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:

hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
aws-java-sdk-1.7.4.jar
hadoop-aws-2.5.0-cdh5.3.3.jar

What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?

Thanks,
Jamie



Re: Enable s3a for fetcher

Posted by Joseph Wu <jo...@mesosphere.io>.
I can't speak to what DCOS does or will do (you can ask on the associated
mailing list: users@dcos.io).

We will be maintaining existing functionality for the fetcher, which means
supporting the schemes:
* file
* http, https, ftp, ftps
* hdfs, hftp, s3, s3n  <--  These rely on hadoop.

And we will retain the --hadoop_home agent flag, which you can use to
specify the hadoop binary.

Other schemes might work right now, if you hack around with your node
setup.  But there's no guarantee that your hack will work between Mesos
versions.  In future, we will associate a fetcher plugin for each scheme.
And you will be able to load custom fetcher plugins for additional schemes.
TLDR: no "nerfing" and less hackiness :)

On Tue, May 10, 2016 at 12:58 PM, Briant, James <
James.Briant@thermofisher.com> wrote:

> This is the mesos latest documentation:
>
> If the requested URI is based on some other protocol, then *the fetcher
> tries to utilise a local Hadoop client* and *hence supports any protocol
> supported by the Hadoop client, e.g., HDFS, S3.* See the slave configuration
> documentation
> <http://mesos.apache.org/documentation/latest/configuration/> for how to
> configure the slave with a path to the Hadoop client. [emphasis added]
>
> What you are saying is that dcos simply wont install hadoop on agents?
>
> Next question then: will you be nerfing fetcher.cpp, or will I be able to
> install hadoop on the agents myself, such that mesos will recognize s3a?
>
>
> From: Joseph Wu <jo...@mesosphere.io>
> Reply-To: "user@mesos.apache.org" <us...@mesos.apache.org>
> Date: Tuesday, May 10, 2016 at 12:20 PM
> To: user <us...@mesos.apache.org>
>
> Subject: Re: Enable s3a for fetcher
>
> Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume
> you have a hadoop binary and use it (blindly) for certain types of URIs.
> If the hadoop binary is not present, the mesos-fetcher will fail to fetch
> your HDFS or S3 URIs.
>
> Mesos does not ship/package hadoop, so these URIs are not expected to work
> out of the box (for plain Mesos distributions).  In all cases, the operator
> must preconfigure hadoop on each node (similar to how Docker in Mesos
> works).
>
> Here's the epic tracking the modularization of the mesos-fetcher (I
> estimate it'll be done by 0.30):
> https://issues.apache.org/jira/browse/MESOS-3918
>
> ^ Once done, it should be easier to plug in more fetchers, such as one for
> your use-case.
>
> On Tue, May 10, 2016 at 11:21 AM, Briant, James <
> James.Briant@thermofisher.com> wrote:
>
>> I’m happy to have default IAM role on the box that can read-only fetch
>> from my s3 bucket. s3a gets the credentials from AWS instance metadata. It
>> works.
>>
>> If hadoop is gone, does that mean that hfds: URIs don’t work either?
>>
>> Are you saying dcos and mesos are diverging? Mesos explicitly supports
>> hdfs and s3.
>>
>> In the absence of S3, how do you propose I make large binaries available
>> to my cluster, and only to my cluster, on AWS?
>>
>> Jamie
>>
>> From: Cody Maloney <co...@mesosphere.io>
>> Reply-To: "user@mesos.apache.org" <us...@mesos.apache.org>
>> Date: Tuesday, May 10, 2016 at 10:58 AM
>> To: "user@mesos.apache.org" <us...@mesos.apache.org>
>> Subject: Re: Enable s3a for fetcher
>>
>> The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop`
>> binary has been entirely removed from DC/OS 1.8 already. There have been
>> various proposals to make it so the mesos fetcher is much more pluggable /
>> extensible (https://issues.apache.org/jira/browse/MESOS-2731 for
>> instance).
>>
>> Generally speaking people want a lot of different sorts of fetching, and
>> there are all sorts of questions of how to properly get auth to the various
>> chunks (if you're using s3a:// presumably you need to get credentials there
>> somehow. Otherwise you could just use http://). Need to design / build
>> that into Mesos and DC/OS to be able to use this stuff.
>>
>> Cody
>>
>> On Tue, May 10, 2016 at 9:55 AM Briant, James <
>> James.Briant@thermofisher.com> wrote:
>>
>>> I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop
>>> 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:
>>>
>>> hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls
>>> usr/share/hadoop/tools/lib/ | grep aws
>>> aws-java-sdk-1.7.4.jar
>>> hadoop-aws-2.5.0-cdh5.3.3.jar
>>>
>>> What config/scripts do I need to hack to get these guys on the classpath
>>> so that "hadoop fs -copyToLocal” works?
>>>
>>> Thanks,
>>> Jamie
>>
>>
>

Re: Enable s3a for fetcher

Posted by "Briant, James" <Ja...@thermofisher.com>.
This is the mesos latest documentation:

If the requested URI is based on some other protocol, then the fetcher tries to utilise a local Hadoop client and hence supports any protocol supported by the Hadoop client, e.g., HDFS, S3. See the slave configuration documentation<http://mesos.apache.org/documentation/latest/configuration/> for how to configure the slave with a path to the Hadoop client. [emphasis added]

What you are saying is that dcos simply wont install hadoop on agents?

Next question then: will you be nerfing fetcher.cpp, or will I be able to install hadoop on the agents myself, such that mesos will recognize s3a?


From: Joseph Wu <jo...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 12:20 PM
To: user <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume you have a hadoop binary and use it (blindly) for certain types of URIs.  If the hadoop binary is not present, the mesos-fetcher will fail to fetch your HDFS or S3 URIs.

Mesos does not ship/package hadoop, so these URIs are not expected to work out of the box (for plain Mesos distributions).  In all cases, the operator must preconfigure hadoop on each node (similar to how Docker in Mesos works).

Here's the epic tracking the modularization of the mesos-fetcher (I estimate it'll be done by 0.30):
https://issues.apache.org/jira/browse/MESOS-3918

^ Once done, it should be easier to plug in more fetchers, such as one for your use-case.

On Tue, May 10, 2016 at 11:21 AM, Briant, James <Ja...@thermofisher.com>> wrote:
I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.

If hadoop is gone, does that mean that hfds: URIs don’t work either?

Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.

In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?

Jamie

From: Cody Maloney <co...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 10:58 AM
To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 for instance).

Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.

Cody

On Tue, May 10, 2016 at 9:55 AM Briant, James <Ja...@thermofisher.com>> wrote:
I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:

hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
aws-java-sdk-1.7.4.jar
hadoop-aws-2.5.0-cdh5.3.3.jar

What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?

Thanks,
Jamie


Re: Enable s3a for fetcher

Posted by Joseph Wu <jo...@mesosphere.io>.
Mesos does not explicitly support HDFS and S3.  Rather, Mesos will assume
you have a hadoop binary and use it (blindly) for certain types of URIs.
If the hadoop binary is not present, the mesos-fetcher will fail to fetch
your HDFS or S3 URIs.

Mesos does not ship/package hadoop, so these URIs are not expected to work
out of the box (for plain Mesos distributions).  In all cases, the operator
must preconfigure hadoop on each node (similar to how Docker in Mesos
works).

Here's the epic tracking the modularization of the mesos-fetcher (I
estimate it'll be done by 0.30):
https://issues.apache.org/jira/browse/MESOS-3918

^ Once done, it should be easier to plug in more fetchers, such as one for
your use-case.

On Tue, May 10, 2016 at 11:21 AM, Briant, James <
James.Briant@thermofisher.com> wrote:

> I’m happy to have default IAM role on the box that can read-only fetch
> from my s3 bucket. s3a gets the credentials from AWS instance metadata. It
> works.
>
> If hadoop is gone, does that mean that hfds: URIs don’t work either?
>
> Are you saying dcos and mesos are diverging? Mesos explicitly supports
> hdfs and s3.
>
> In the absence of S3, how do you propose I make large binaries available
> to my cluster, and only to my cluster, on AWS?
>
> Jamie
>
> From: Cody Maloney <co...@mesosphere.io>
> Reply-To: "user@mesos.apache.org" <us...@mesos.apache.org>
> Date: Tuesday, May 10, 2016 at 10:58 AM
> To: "user@mesos.apache.org" <us...@mesos.apache.org>
> Subject: Re: Enable s3a for fetcher
>
> The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary
> has been entirely removed from DC/OS 1.8 already. There have been various
> proposals to make it so the mesos fetcher is much more pluggable /
> extensible (https://issues.apache.org/jira/browse/MESOS-2731 for
> instance).
>
> Generally speaking people want a lot of different sorts of fetching, and
> there are all sorts of questions of how to properly get auth to the various
> chunks (if you're using s3a:// presumably you need to get credentials there
> somehow. Otherwise you could just use http://). Need to design / build
> that into Mesos and DC/OS to be able to use this stuff.
>
> Cody
>
> On Tue, May 10, 2016 at 9:55 AM Briant, James <
> James.Briant@thermofisher.com> wrote:
>
>> I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop
>> 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:
>>
>> hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls
>> usr/share/hadoop/tools/lib/ | grep aws
>> aws-java-sdk-1.7.4.jar
>> hadoop-aws-2.5.0-cdh5.3.3.jar
>>
>> What config/scripts do I need to hack to get these guys on the classpath
>> so that "hadoop fs -copyToLocal” works?
>>
>> Thanks,
>> Jamie
>
>

Re: Enable s3a for fetcher

Posted by Ken Sipe <ke...@gmail.com>.
Jamie,

The general philosophy is that services should depend very little on the base image (some would say no dependency).   There has been an HDFS on the base image which we have leveraged while we work on higher priorities.  It was always our intent to remove it.  Another example and another enabler to this working is there is Java JRE on the base.  It would be a bad idea to get addicted to it :)   

That said,  it has always been our intention to support different protocols (such as retrieving artifacts from HDFS which  other services (such as Chronos) could leverage).    It makes sense that we support s3 retrieval as well.   It does mean that we need a pluggable way to hook in solutions to protocols other that http.   We have had some discussion around it and have a design idea in place.   At this point it is an issue of priority and timing.

ken
> On May 10, 2016, at 1:21 PM, Briant, James <Ja...@thermofisher.com> wrote:
> 
> I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.
> 
> If hadoop is gone, does that mean that hfds: URIs don’t work either?
> 
> Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.
> 
> In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?
> 
> Jamie
> 
> From: Cody Maloney <cody@mesosphere.io <ma...@mesosphere.io>>
> Reply-To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
> Date: Tuesday, May 10, 2016 at 10:58 AM
> To: "user@mesos.apache.org <ma...@mesos.apache.org>" <user@mesos.apache.org <ma...@mesos.apache.org>>
> Subject: Re: Enable s3a for fetcher
> 
> The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 <https://issues.apache.org/jira/browse/MESOS-2731> for instance). 
> 
> Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.
> 
> Cody
> 
> On Tue, May 10, 2016 at 9:55 AM Briant, James <James.Briant@thermofisher.com <ma...@thermofisher.com>> wrote:
>> I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:
>> 
>> hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
>> aws-java-sdk-1.7.4.jar
>> hadoop-aws-2.5.0-cdh5.3.3.jar
>> 
>> What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?
>> 
>> Thanks,
>> Jamie


Re: Enable s3a for fetcher

Posted by "Briant, James" <Ja...@thermofisher.com>.
I’m happy to have default IAM role on the box that can read-only fetch from my s3 bucket. s3a gets the credentials from AWS instance metadata. It works.

If hadoop is gone, does that mean that hfds: URIs don’t work either?

Are you saying dcos and mesos are diverging? Mesos explicitly supports hdfs and s3.

In the absence of S3, how do you propose I make large binaries available to my cluster, and only to my cluster, on AWS?

Jamie

From: Cody Maloney <co...@mesosphere.io>>
Reply-To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Date: Tuesday, May 10, 2016 at 10:58 AM
To: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
Subject: Re: Enable s3a for fetcher

The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary has been entirely removed from DC/OS 1.8 already. There have been various proposals to make it so the mesos fetcher is much more pluggable / extensible (https://issues.apache.org/jira/browse/MESOS-2731 for instance).

Generally speaking people want a lot of different sorts of fetching, and there are all sorts of questions of how to properly get auth to the various chunks (if you're using s3a:// presumably you need to get credentials there somehow. Otherwise you could just use http://). Need to design / build that into Mesos and DC/OS to be able to use this stuff.

Cody

On Tue, May 10, 2016 at 9:55 AM Briant, James <Ja...@thermofisher.com>> wrote:
I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:

hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls usr/share/hadoop/tools/lib/ | grep aws
aws-java-sdk-1.7.4.jar
hadoop-aws-2.5.0-cdh5.3.3.jar

What config/scripts do I need to hack to get these guys on the classpath so that "hadoop fs -copyToLocal” works?

Thanks,
Jamie

Re: Enable s3a for fetcher

Posted by Cody Maloney <co...@mesosphere.io>.
The s3 fetcher stuff inside of DC/OS is not supported. The `hadoop` binary
has been entirely removed from DC/OS 1.8 already. There have been various
proposals to make it so the mesos fetcher is much more pluggable /
extensible (https://issues.apache.org/jira/browse/MESOS-2731 for instance).

Generally speaking people want a lot of different sorts of fetching, and
there are all sorts of questions of how to properly get auth to the various
chunks (if you're using s3a:// presumably you need to get credentials there
somehow. Otherwise you could just use http://). Need to design / build that
into Mesos and DC/OS to be able to use this stuff.

Cody

On Tue, May 10, 2016 at 9:55 AM Briant, James <Ja...@thermofisher.com>
wrote:

> I want to use s3a: urls in fetcher. I’m using dcos 1.7 which has hadoop
> 2.5 on its agents. This version has the necessary hadoop-aws and aws-sdk:
>
> hadoop--afadb46fe64d0ee7ce23dbe769e44bfb0767a8b9]$ ls
> usr/share/hadoop/tools/lib/ | grep aws
> aws-java-sdk-1.7.4.jar
> hadoop-aws-2.5.0-cdh5.3.3.jar
>
> What config/scripts do I need to hack to get these guys on the classpath
> so that "hadoop fs -copyToLocal” works?
>
> Thanks,
> Jamie