You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Tom Arnfeld (JIRA)" <ji...@apache.org> on 2014/05/23 11:55:02 UTC

[jira] [Comment Edited] (MESOS-1405) Mesos fetcher does not support S3(n)

    [ https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010 ] 

Tom Arnfeld edited comment on MESOS-1405 at 5/23/14 9:53 AM:
-------------------------------------------------------------

Review request: https://reviews.apache.org/r/21852/

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path '/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've not set all the environment variables it throws an error, trying to resolve the path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path '/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work [~bernd-mesos] – or if it's something you've already done? This implementation also isn't really very scalable, if we want to maintain good compatibility with the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos to pass their custom URIs through to hadoop. An example here is if a user was using GlusterFS instead of HDFS.


was (Author: tarnfeld):
Review request:

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path '/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've not set all the environment variables it throws an error, trying to resolve the path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path '/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work [~bernd-mesos] – or if it's something you've already done? This implementation also isn't really very scalable, if we want to maintain good compatibility with the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos to pass their custom URIs through to hadoop. An example here is if a user was using GlusterFS instead of HDFS.

> Mesos fetcher does not support S3(n)
> ------------------------------------
>
>                 Key: MESOS-1405
>                 URL: https://issues.apache.org/jira/browse/MESOS-1405
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 0.18.2
>            Reporter: Tom Arnfeld
>            Assignee: Tom Arnfeld
>            Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the difference between the two can be found here: http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) and let hadoop do the work, or we can integrate with S3 directly. The latter then requires we have a way of managing S3 credentials, whereas using the HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)