You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Takenori Sato <ts...@cloudian.com> on 2014/09/29 01:29:14 UTC

Re: Re: Regarding HDFS and YARN support for S3

Hi,

You may want to check HADOOP-10400
<https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of S3
filesystem fixed in 2.6.

The subclass of AbstractFileSystem was filed as HADOOP-10643
<https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
included in HADOOP-10400 though I made a comment
<https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
.

I suggest not to use S3 as defaultFS as commented in "Why you cannot use S3
as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>" to
avoid all sorts of these issues.

The best practice is to use S3 as a supplementary solution to Hadoop in
order to bring life cycle management(expiration and tiering), and
source/destination over the internet.

Thanks,
Takenori


On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Jay,
> Thanks a lot for replying and it clarifies most of it, but still some
> parts are not so clear .
> Some clarifications from my side :
> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
> That is true.  If your file system is configured using HDFS, then s3 urls
> will not be used, ever.*
> :) i think i am not doing this basic mistake . What we have done is we
> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
> .
> So it fails to even create YARNRunner instance as there is no mapping for
> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per the
> code even if set *"fs.defaultFS"* to s3 it will not work as there is no
> mapping for S3's impl of AbstractFileSystem interface.
>
>  These are my further queries
>
>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>    interfaces?
>    2. Does HDFS default package(code) support configuration of S3 ? I see
>    S3 implementation of *FileSystem* interface(
>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>    for not supporting both ?
>    3. Suppose if i need to support Amazon S3 do i need to extend and
>    implement *AbstractFileSystem *and configure  *"**fs.AbstractFileSystem.s3.impl"
>    *or some thing more than this i need to take care*?*
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>    ------------------------------
> *From:* jay vyas [jayunit100.apache@gmail.com]
> *Sent:* Saturday, September 27, 2014 02:41
> *To:* common-user@hadoop.apache.org
> *Subject:* Re:
>
>      See https://wiki.apache.org/hadoop/HCFS/
>
> YES Yarn is written to the FileSystem interface.  It works on S3FileSystem
> and GlusterFileSystem and any other HCFS.
>
>  We have run , and continue to run, the many tests in apache bigtop's test
> suite against our hadoop clusters running on alternative file system
> implementations,
>  and it works.
>
>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
> That is true.  If your file system is configured using HDFS, then s3 urls
> will not be used, ever.
>
>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
> "glusterfs:///") and then finds the file system binding in your
> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>
>  So the URI must have a corresponding entry in the core-site.xml.
>
>  As a reference implementation, you can see
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
>
>
>
> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
>>   Hi All,
>>
>>  I have following doubts on pluggable FileSystem and YARN
>> 1. If all the implementations should extend FileSystem then why there is
>> a parallel class AbstractFileSystem. which ViewFS extends ?
>> 2. Is YARN supposed to run on any of the pluggable
>> org.apache.hadoop.fs.FileSystem like s3 ?
>> if its suppose to run then when submitting a job in the client side
>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>> which is further calling FileContext.getAbstractFileSystem() which throws
>> exception for S3.
>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>> on the code even if i configure only S3 then also its going to fail.
>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: naganarasimhagr@huawei.com
>> Huawei Technologies Co., Ltd.
>> http://www.huawei.com
>>
>>
>>
>
>
> --
> jay vyas
>

Re: Re: Regarding HDFS and YARN support for S3

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Naga,

> But what i don't understand is why 2 interfaces (may be i am novice in
HDFS and hence not able to completely correlate with jira's which you
gave).

A client program is encouraged to use FileContext API instead of FileSystem
API. Here's why <http://www.slideshare.net/hadoopusergroup/file-context>.
And the whole discussion is at HADOOP-6223(New improved FileSystem
interface for those implementing new files systems.).

Thanks,
Takenori

On Mon, Sep 29, 2014 at 11:27 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Takenori,
> Thanks for replying but still seem not getting some concepts
> I understand that we need to give *"**fs.AbstractFileSystem.s3.impl" *if
> we want to submit job using "./yarn jar" with S3 HCFS configured*. *
> But what i don't understand is why 2 interfaces (may be i am novice in
> HDFS and hence not able to completely correlate with jira's which you
> gave).
> If you can brief the differences between FileSystem and
> AbstractFileSystem, It would be helpful.
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>  *From:* Takenori Sato [tsato@cloudian.com]
> *Sent:* Monday, September 29, 2014 07:29
> *To:* user@hadoop.apache.org
> *Subject:* Re: Re: Regarding HDFS and YARN support for S3
>
>   Hi,
>
>  You may want to check HADOOP-10400
> <https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of
> S3 filesystem fixed in 2.6.
>
>  The subclass of AbstractFileSystem was filed as HADOOP-10643
> <https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
> included in HADOOP-10400 though I made a comment
> <https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
> .
>
>  I suggest not to use S3 as defaultFS as commented in "Why you cannot use
> S3 as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>"
> to avoid all sorts of these issues.
>
>  The best practice is to use S3 as a supplementary solution to Hadoop in
> order to bring life cycle management(expiration and tiering), and
> source/destination over the internet.
>
>  Thanks,
> Takenori
>
>
> On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
>>  Hi Jay,
>> Thanks a lot for replying and it clarifies most of it, but still some
>> parts are not so clear .
>> Some clarifications from my side :
>> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.*
>> :) i think i am not doing this basic mistake . What we have done is we
>> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
>> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
>> .
>> So it fails to even create YARNRunner instance as there is no mapping for
>> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per
>> the code even if set *"fs.defaultFS"* to s3 it will not work as there is
>> no mapping for S3's impl of AbstractFileSystem interface.
>>
>>  These are my further queries
>>
>>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>>    interfaces?
>>    2. Does HDFS default package(code) support configuration of S3 ? I
>>    see S3 implementation of *FileSystem* interface(
>>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>>    for not supporting both ?
>>    3. Suppose if i need to support Amazon S3 do i need to extend and
>>    implement *AbstractFileSystem *and configure  *"**fs.AbstractFileSystem.s3.impl"
>>    *or some thing more than this i need to take care*?*
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: naganarasimhagr@huawei.com
>> Huawei Technologies Co., Ltd.
>> Bantian, Longgang District,Shenzhen 518129, P.R.China
>> http://www.huawei.com
>>
>>
>>    ------------------------------
>> *From:* jay vyas [jayunit100.apache@gmail.com]
>> *Sent:* Saturday, September 27, 2014 02:41
>> *To:* common-user@hadoop.apache.org
>> *Subject:* Re:
>>
>>      See https://wiki.apache.org/hadoop/HCFS/
>>
>> YES Yarn is written to the FileSystem interface.  It works on
>> S3FileSystem and GlusterFileSystem and any other HCFS.
>>
>>  We have run , and continue to run, the many tests in apache bigtop's
>> test suite against our hadoop clusters running on alternative file system
>> implementations,
>>  and it works.
>>
>>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.
>>
>>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
>> "glusterfs:///") and then finds the file system binding in your
>> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>>
>>  So the URI must have a corresponding entry in the core-site.xml.
>>
>>  As a reference implementation, you can see
>> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>>>   Hi All,
>>>
>>>  I have following doubts on pluggable FileSystem and YARN
>>> 1. If all the implementations should extend FileSystem then why there is
>>> a parallel class AbstractFileSystem. which ViewFS extends ?
>>> 2. Is YARN supposed to run on any of the pluggable
>>> org.apache.hadoop.fs.FileSystem like s3 ?
>>> if its suppose to run then when submitting a job in the client side
>>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>>> which is further calling FileContext.getAbstractFileSystem() which
>>> throws exception for S3.
>>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>>> on the code even if i configure only S3 then also its going to fail.
>>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>>
>>>    Regards,
>>>
>>> Naga
>>>
>>>
>>>
>>> Huawei Technologies Co., Ltd.
>>> Phone:
>>> Fax:
>>> Mobile:  +91 9980040283
>>> Email: naganarasimhagr@huawei.com
>>> Huawei Technologies Co., Ltd.
>>> http://www.huawei.com
>>>
>>>
>>>
>>
>>
>> --
>> jay vyas
>>
>
>

Re: Re: Regarding HDFS and YARN support for S3

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Naga,

> But what i don't understand is why 2 interfaces (may be i am novice in
HDFS and hence not able to completely correlate with jira's which you
gave).

A client program is encouraged to use FileContext API instead of FileSystem
API. Here's why <http://www.slideshare.net/hadoopusergroup/file-context>.
And the whole discussion is at HADOOP-6223(New improved FileSystem
interface for those implementing new files systems.).

Thanks,
Takenori

On Mon, Sep 29, 2014 at 11:27 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Takenori,
> Thanks for replying but still seem not getting some concepts
> I understand that we need to give *"**fs.AbstractFileSystem.s3.impl" *if
> we want to submit job using "./yarn jar" with S3 HCFS configured*. *
> But what i don't understand is why 2 interfaces (may be i am novice in
> HDFS and hence not able to completely correlate with jira's which you
> gave).
> If you can brief the differences between FileSystem and
> AbstractFileSystem, It would be helpful.
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>  *From:* Takenori Sato [tsato@cloudian.com]
> *Sent:* Monday, September 29, 2014 07:29
> *To:* user@hadoop.apache.org
> *Subject:* Re: Re: Regarding HDFS and YARN support for S3
>
>   Hi,
>
>  You may want to check HADOOP-10400
> <https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of
> S3 filesystem fixed in 2.6.
>
>  The subclass of AbstractFileSystem was filed as HADOOP-10643
> <https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
> included in HADOOP-10400 though I made a comment
> <https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
> .
>
>  I suggest not to use S3 as defaultFS as commented in "Why you cannot use
> S3 as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>"
> to avoid all sorts of these issues.
>
>  The best practice is to use S3 as a supplementary solution to Hadoop in
> order to bring life cycle management(expiration and tiering), and
> source/destination over the internet.
>
>  Thanks,
> Takenori
>
>
> On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
>>  Hi Jay,
>> Thanks a lot for replying and it clarifies most of it, but still some
>> parts are not so clear .
>> Some clarifications from my side :
>> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.*
>> :) i think i am not doing this basic mistake . What we have done is we
>> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
>> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
>> .
>> So it fails to even create YARNRunner instance as there is no mapping for
>> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per
>> the code even if set *"fs.defaultFS"* to s3 it will not work as there is
>> no mapping for S3's impl of AbstractFileSystem interface.
>>
>>  These are my further queries
>>
>>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>>    interfaces?
>>    2. Does HDFS default package(code) support configuration of S3 ? I
>>    see S3 implementation of *FileSystem* interface(
>>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>>    for not supporting both ?
>>    3. Suppose if i need to support Amazon S3 do i need to extend and
>>    implement *AbstractFileSystem *and configure  *"**fs.AbstractFileSystem.s3.impl"
>>    *or some thing more than this i need to take care*?*
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: naganarasimhagr@huawei.com
>> Huawei Technologies Co., Ltd.
>> Bantian, Longgang District,Shenzhen 518129, P.R.China
>> http://www.huawei.com
>>
>>
>>    ------------------------------
>> *From:* jay vyas [jayunit100.apache@gmail.com]
>> *Sent:* Saturday, September 27, 2014 02:41
>> *To:* common-user@hadoop.apache.org
>> *Subject:* Re:
>>
>>      See https://wiki.apache.org/hadoop/HCFS/
>>
>> YES Yarn is written to the FileSystem interface.  It works on
>> S3FileSystem and GlusterFileSystem and any other HCFS.
>>
>>  We have run , and continue to run, the many tests in apache bigtop's
>> test suite against our hadoop clusters running on alternative file system
>> implementations,
>>  and it works.
>>
>>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.
>>
>>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
>> "glusterfs:///") and then finds the file system binding in your
>> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>>
>>  So the URI must have a corresponding entry in the core-site.xml.
>>
>>  As a reference implementation, you can see
>> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>>>   Hi All,
>>>
>>>  I have following doubts on pluggable FileSystem and YARN
>>> 1. If all the implementations should extend FileSystem then why there is
>>> a parallel class AbstractFileSystem. which ViewFS extends ?
>>> 2. Is YARN supposed to run on any of the pluggable
>>> org.apache.hadoop.fs.FileSystem like s3 ?
>>> if its suppose to run then when submitting a job in the client side
>>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>>> which is further calling FileContext.getAbstractFileSystem() which
>>> throws exception for S3.
>>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>>> on the code even if i configure only S3 then also its going to fail.
>>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>>
>>>    Regards,
>>>
>>> Naga
>>>
>>>
>>>
>>> Huawei Technologies Co., Ltd.
>>> Phone:
>>> Fax:
>>> Mobile:  +91 9980040283
>>> Email: naganarasimhagr@huawei.com
>>> Huawei Technologies Co., Ltd.
>>> http://www.huawei.com
>>>
>>>
>>>
>>
>>
>> --
>> jay vyas
>>
>
>

Re: Re: Regarding HDFS and YARN support for S3

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Naga,

> But what i don't understand is why 2 interfaces (may be i am novice in
HDFS and hence not able to completely correlate with jira's which you
gave).

A client program is encouraged to use FileContext API instead of FileSystem
API. Here's why <http://www.slideshare.net/hadoopusergroup/file-context>.
And the whole discussion is at HADOOP-6223(New improved FileSystem
interface for those implementing new files systems.).

Thanks,
Takenori

On Mon, Sep 29, 2014 at 11:27 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Takenori,
> Thanks for replying but still seem not getting some concepts
> I understand that we need to give *"**fs.AbstractFileSystem.s3.impl" *if
> we want to submit job using "./yarn jar" with S3 HCFS configured*. *
> But what i don't understand is why 2 interfaces (may be i am novice in
> HDFS and hence not able to completely correlate with jira's which you
> gave).
> If you can brief the differences between FileSystem and
> AbstractFileSystem, It would be helpful.
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>  *From:* Takenori Sato [tsato@cloudian.com]
> *Sent:* Monday, September 29, 2014 07:29
> *To:* user@hadoop.apache.org
> *Subject:* Re: Re: Regarding HDFS and YARN support for S3
>
>   Hi,
>
>  You may want to check HADOOP-10400
> <https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of
> S3 filesystem fixed in 2.6.
>
>  The subclass of AbstractFileSystem was filed as HADOOP-10643
> <https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
> included in HADOOP-10400 though I made a comment
> <https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
> .
>
>  I suggest not to use S3 as defaultFS as commented in "Why you cannot use
> S3 as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>"
> to avoid all sorts of these issues.
>
>  The best practice is to use S3 as a supplementary solution to Hadoop in
> order to bring life cycle management(expiration and tiering), and
> source/destination over the internet.
>
>  Thanks,
> Takenori
>
>
> On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
>>  Hi Jay,
>> Thanks a lot for replying and it clarifies most of it, but still some
>> parts are not so clear .
>> Some clarifications from my side :
>> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.*
>> :) i think i am not doing this basic mistake . What we have done is we
>> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
>> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
>> .
>> So it fails to even create YARNRunner instance as there is no mapping for
>> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per
>> the code even if set *"fs.defaultFS"* to s3 it will not work as there is
>> no mapping for S3's impl of AbstractFileSystem interface.
>>
>>  These are my further queries
>>
>>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>>    interfaces?
>>    2. Does HDFS default package(code) support configuration of S3 ? I
>>    see S3 implementation of *FileSystem* interface(
>>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>>    for not supporting both ?
>>    3. Suppose if i need to support Amazon S3 do i need to extend and
>>    implement *AbstractFileSystem *and configure  *"**fs.AbstractFileSystem.s3.impl"
>>    *or some thing more than this i need to take care*?*
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: naganarasimhagr@huawei.com
>> Huawei Technologies Co., Ltd.
>> Bantian, Longgang District,Shenzhen 518129, P.R.China
>> http://www.huawei.com
>>
>>
>>    ------------------------------
>> *From:* jay vyas [jayunit100.apache@gmail.com]
>> *Sent:* Saturday, September 27, 2014 02:41
>> *To:* common-user@hadoop.apache.org
>> *Subject:* Re:
>>
>>      See https://wiki.apache.org/hadoop/HCFS/
>>
>> YES Yarn is written to the FileSystem interface.  It works on
>> S3FileSystem and GlusterFileSystem and any other HCFS.
>>
>>  We have run , and continue to run, the many tests in apache bigtop's
>> test suite against our hadoop clusters running on alternative file system
>> implementations,
>>  and it works.
>>
>>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.
>>
>>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
>> "glusterfs:///") and then finds the file system binding in your
>> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>>
>>  So the URI must have a corresponding entry in the core-site.xml.
>>
>>  As a reference implementation, you can see
>> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>>>   Hi All,
>>>
>>>  I have following doubts on pluggable FileSystem and YARN
>>> 1. If all the implementations should extend FileSystem then why there is
>>> a parallel class AbstractFileSystem. which ViewFS extends ?
>>> 2. Is YARN supposed to run on any of the pluggable
>>> org.apache.hadoop.fs.FileSystem like s3 ?
>>> if its suppose to run then when submitting a job in the client side
>>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>>> which is further calling FileContext.getAbstractFileSystem() which
>>> throws exception for S3.
>>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>>> on the code even if i configure only S3 then also its going to fail.
>>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>>
>>>    Regards,
>>>
>>> Naga
>>>
>>>
>>>
>>> Huawei Technologies Co., Ltd.
>>> Phone:
>>> Fax:
>>> Mobile:  +91 9980040283
>>> Email: naganarasimhagr@huawei.com
>>> Huawei Technologies Co., Ltd.
>>> http://www.huawei.com
>>>
>>>
>>>
>>
>>
>> --
>> jay vyas
>>
>
>

Re: Re: Regarding HDFS and YARN support for S3

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Naga,

> But what i don't understand is why 2 interfaces (may be i am novice in
HDFS and hence not able to completely correlate with jira's which you
gave).

A client program is encouraged to use FileContext API instead of FileSystem
API. Here's why <http://www.slideshare.net/hadoopusergroup/file-context>.
And the whole discussion is at HADOOP-6223(New improved FileSystem
interface for those implementing new files systems.).

Thanks,
Takenori

On Mon, Sep 29, 2014 at 11:27 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Takenori,
> Thanks for replying but still seem not getting some concepts
> I understand that we need to give *"**fs.AbstractFileSystem.s3.impl" *if
> we want to submit job using "./yarn jar" with S3 HCFS configured*. *
> But what i don't understand is why 2 interfaces (may be i am novice in
> HDFS and hence not able to completely correlate with jira's which you
> gave).
> If you can brief the differences between FileSystem and
> AbstractFileSystem, It would be helpful.
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>  *From:* Takenori Sato [tsato@cloudian.com]
> *Sent:* Monday, September 29, 2014 07:29
> *To:* user@hadoop.apache.org
> *Subject:* Re: Re: Regarding HDFS and YARN support for S3
>
>   Hi,
>
>  You may want to check HADOOP-10400
> <https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of
> S3 filesystem fixed in 2.6.
>
>  The subclass of AbstractFileSystem was filed as HADOOP-10643
> <https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
> included in HADOOP-10400 though I made a comment
> <https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
> .
>
>  I suggest not to use S3 as defaultFS as commented in "Why you cannot use
> S3 as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>"
> to avoid all sorts of these issues.
>
>  The best practice is to use S3 as a supplementary solution to Hadoop in
> order to bring life cycle management(expiration and tiering), and
> source/destination over the internet.
>
>  Thanks,
> Takenori
>
>
> On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
>>  Hi Jay,
>> Thanks a lot for replying and it clarifies most of it, but still some
>> parts are not so clear .
>> Some clarifications from my side :
>> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.*
>> :) i think i am not doing this basic mistake . What we have done is we
>> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
>> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
>> .
>> So it fails to even create YARNRunner instance as there is no mapping for
>> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per
>> the code even if set *"fs.defaultFS"* to s3 it will not work as there is
>> no mapping for S3's impl of AbstractFileSystem interface.
>>
>>  These are my further queries
>>
>>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>>    interfaces?
>>    2. Does HDFS default package(code) support configuration of S3 ? I
>>    see S3 implementation of *FileSystem* interface(
>>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>>    for not supporting both ?
>>    3. Suppose if i need to support Amazon S3 do i need to extend and
>>    implement *AbstractFileSystem *and configure  *"**fs.AbstractFileSystem.s3.impl"
>>    *or some thing more than this i need to take care*?*
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: naganarasimhagr@huawei.com
>> Huawei Technologies Co., Ltd.
>> Bantian, Longgang District,Shenzhen 518129, P.R.China
>> http://www.huawei.com
>>
>>
>>    ------------------------------
>> *From:* jay vyas [jayunit100.apache@gmail.com]
>> *Sent:* Saturday, September 27, 2014 02:41
>> *To:* common-user@hadoop.apache.org
>> *Subject:* Re:
>>
>>      See https://wiki.apache.org/hadoop/HCFS/
>>
>> YES Yarn is written to the FileSystem interface.  It works on
>> S3FileSystem and GlusterFileSystem and any other HCFS.
>>
>>  We have run , and continue to run, the many tests in apache bigtop's
>> test suite against our hadoop clusters running on alternative file system
>> implementations,
>>  and it works.
>>
>>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.
>>
>>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
>> "glusterfs:///") and then finds the file system binding in your
>> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>>
>>  So the URI must have a corresponding entry in the core-site.xml.
>>
>>  As a reference implementation, you can see
>> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>>>   Hi All,
>>>
>>>  I have following doubts on pluggable FileSystem and YARN
>>> 1. If all the implementations should extend FileSystem then why there is
>>> a parallel class AbstractFileSystem. which ViewFS extends ?
>>> 2. Is YARN supposed to run on any of the pluggable
>>> org.apache.hadoop.fs.FileSystem like s3 ?
>>> if its suppose to run then when submitting a job in the client side
>>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>>> which is further calling FileContext.getAbstractFileSystem() which
>>> throws exception for S3.
>>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>>> on the code even if i configure only S3 then also its going to fail.
>>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>>
>>>    Regards,
>>>
>>> Naga
>>>
>>>
>>>
>>> Huawei Technologies Co., Ltd.
>>> Phone:
>>> Fax:
>>> Mobile:  +91 9980040283
>>> Email: naganarasimhagr@huawei.com
>>> Huawei Technologies Co., Ltd.
>>> http://www.huawei.com
>>>
>>>
>>>
>>
>>
>> --
>> jay vyas
>>
>
>

RE: Re: Regarding HDFS and YARN support for S3

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Takenori,
Thanks for replying but still seem not getting some concepts
I understand that we need to give "fs.AbstractFileSystem.s3.impl" if we want to submit job using "./yarn jar" with S3 HCFS configured.
But what i don't understand is why 2 interfaces (may be i am novice in HDFS and hence not able to completely correlate with jira's which you gave).
If you can brief the differences between FileSystem and AbstractFileSystem, It would be helpful.


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

From: Takenori Sato [tsato@cloudian.com]
Sent: Monday, September 29, 2014 07:29
To: user@hadoop.apache.org
Subject: Re: Re: Regarding HDFS and YARN support for S3

Hi,

You may want to check HADOOP-10400<https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of S3 filesystem fixed in 2.6.

The subclass of AbstractFileSystem was filed as HADOOP-10643<https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not included in HADOOP-10400 though I made a comment<https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>.

I suggest not to use S3 as defaultFS as commented in "Why you cannot use S3 as a replacement for HDFS<https://wiki.apache.org/hadoop/AmazonS3>" to avoid all sorts of these issues.

The best practice is to use S3 as a supplementary solution to Hadoop in order to bring life cycle management(expiration and tiering), and source/destination over the internet.

Thanks,
Takenori


On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Jay,
Thanks a lot for replying and it clarifies most of it, but still some parts are not so clear .
Some clarifications from my side :
| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.
:) i think i am not doing this basic mistake . What we have done is we have configured "viewfs://nsX" for "fs.defaultFS" and one of the mount is S3 i.e. "fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/".
So it fails to even create YARNRunner instance as there is no mapping for "fs.AbstractFileSystem.s3.impl" if run "./yarn jar". But as per the code even if set "fs.defaultFS" to s3 it will not work as there is no mapping for S3's impl of AbstractFileSystem interface.

These are my further queries

  1.  Whats the purpose of AbstractFileSystem and FileSystem interfaces?
  2.  Does HDFS default package(code) support configuration of S3 ? I see S3 implementation of FileSystem interface(org.apache.hadoop.fs.s3.S3FileSystem) but not for AbstractFileSystem !. So i presume it doesn't support S3 completely. Whats the reason for not supporting both ?
  3.  Suppose if i need to support Amazon S3 do i need to extend and implement AbstractFileSystem and configure  "fs.AbstractFileSystem.s3.impl" or some thing more than this i need to take care?

Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com


________________________________
From: jay vyas [jayunit100.apache@gmail.com<ma...@gmail.com>]
Sent: Saturday, September 27, 2014 02:41
To: common-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re:

See https://wiki.apache.org/hadoop/HCFS/

YES Yarn is written to the FileSystem interface.  It works on S3FileSystem and GlusterFileSystem and any other HCFS.

We have run , and continue to run, the many tests in apache bigtop's test suite against our hadoop clusters running on alternative file system implementations,
and it works.

When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.

When you create a FileSystem object in hadoop, it reads the uri (i.e. "glusterfs:///") and then finds the file system binding in your core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).

So the URI must have a corresponding entry in the core-site.xml.

As a reference implementation, you can see https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml




On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi All,

I have following doubts on pluggable FileSystem and YARN
1. If all the implementations should extend FileSystem then why there is a parallel class AbstractFileSystem. which ViewFS extends ?
2. Is YARN supposed to run on any of the pluggable org.apache.hadoop.fs.FileSystem like s3 ?
if its suppose to run then when submitting a job in the client side  YARNRunner is calling FileContext.getFileContext(this.conf);
which is further calling FileContext.getAbstractFileSystem() which throws exception for S3.
So i am not able to run YARN job with ViewFS with S3 as mount. And based on the code even if i configure only S3 then also its going to fail.
3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
http://www.huawei.com




--
jay vyas

RE: Re: Regarding HDFS and YARN support for S3

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Takenori,
Thanks for replying but still seem not getting some concepts
I understand that we need to give "fs.AbstractFileSystem.s3.impl" if we want to submit job using "./yarn jar" with S3 HCFS configured.
But what i don't understand is why 2 interfaces (may be i am novice in HDFS and hence not able to completely correlate with jira's which you gave).
If you can brief the differences between FileSystem and AbstractFileSystem, It would be helpful.


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

From: Takenori Sato [tsato@cloudian.com]
Sent: Monday, September 29, 2014 07:29
To: user@hadoop.apache.org
Subject: Re: Re: Regarding HDFS and YARN support for S3

Hi,

You may want to check HADOOP-10400<https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of S3 filesystem fixed in 2.6.

The subclass of AbstractFileSystem was filed as HADOOP-10643<https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not included in HADOOP-10400 though I made a comment<https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>.

I suggest not to use S3 as defaultFS as commented in "Why you cannot use S3 as a replacement for HDFS<https://wiki.apache.org/hadoop/AmazonS3>" to avoid all sorts of these issues.

The best practice is to use S3 as a supplementary solution to Hadoop in order to bring life cycle management(expiration and tiering), and source/destination over the internet.

Thanks,
Takenori


On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Jay,
Thanks a lot for replying and it clarifies most of it, but still some parts are not so clear .
Some clarifications from my side :
| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.
:) i think i am not doing this basic mistake . What we have done is we have configured "viewfs://nsX" for "fs.defaultFS" and one of the mount is S3 i.e. "fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/".
So it fails to even create YARNRunner instance as there is no mapping for "fs.AbstractFileSystem.s3.impl" if run "./yarn jar". But as per the code even if set "fs.defaultFS" to s3 it will not work as there is no mapping for S3's impl of AbstractFileSystem interface.

These are my further queries

  1.  Whats the purpose of AbstractFileSystem and FileSystem interfaces?
  2.  Does HDFS default package(code) support configuration of S3 ? I see S3 implementation of FileSystem interface(org.apache.hadoop.fs.s3.S3FileSystem) but not for AbstractFileSystem !. So i presume it doesn't support S3 completely. Whats the reason for not supporting both ?
  3.  Suppose if i need to support Amazon S3 do i need to extend and implement AbstractFileSystem and configure  "fs.AbstractFileSystem.s3.impl" or some thing more than this i need to take care?

Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com


________________________________
From: jay vyas [jayunit100.apache@gmail.com<ma...@gmail.com>]
Sent: Saturday, September 27, 2014 02:41
To: common-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re:

See https://wiki.apache.org/hadoop/HCFS/

YES Yarn is written to the FileSystem interface.  It works on S3FileSystem and GlusterFileSystem and any other HCFS.

We have run , and continue to run, the many tests in apache bigtop's test suite against our hadoop clusters running on alternative file system implementations,
and it works.

When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.

When you create a FileSystem object in hadoop, it reads the uri (i.e. "glusterfs:///") and then finds the file system binding in your core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).

So the URI must have a corresponding entry in the core-site.xml.

As a reference implementation, you can see https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml




On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi All,

I have following doubts on pluggable FileSystem and YARN
1. If all the implementations should extend FileSystem then why there is a parallel class AbstractFileSystem. which ViewFS extends ?
2. Is YARN supposed to run on any of the pluggable org.apache.hadoop.fs.FileSystem like s3 ?
if its suppose to run then when submitting a job in the client side  YARNRunner is calling FileContext.getFileContext(this.conf);
which is further calling FileContext.getAbstractFileSystem() which throws exception for S3.
So i am not able to run YARN job with ViewFS with S3 as mount. And based on the code even if i configure only S3 then also its going to fail.
3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
http://www.huawei.com




--
jay vyas

RE: Re: Regarding HDFS and YARN support for S3

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Takenori,
Thanks for replying but still seem not getting some concepts
I understand that we need to give "fs.AbstractFileSystem.s3.impl" if we want to submit job using "./yarn jar" with S3 HCFS configured.
But what i don't understand is why 2 interfaces (may be i am novice in HDFS and hence not able to completely correlate with jira's which you gave).
If you can brief the differences between FileSystem and AbstractFileSystem, It would be helpful.


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

From: Takenori Sato [tsato@cloudian.com]
Sent: Monday, September 29, 2014 07:29
To: user@hadoop.apache.org
Subject: Re: Re: Regarding HDFS and YARN support for S3

Hi,

You may want to check HADOOP-10400<https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of S3 filesystem fixed in 2.6.

The subclass of AbstractFileSystem was filed as HADOOP-10643<https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not included in HADOOP-10400 though I made a comment<https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>.

I suggest not to use S3 as defaultFS as commented in "Why you cannot use S3 as a replacement for HDFS<https://wiki.apache.org/hadoop/AmazonS3>" to avoid all sorts of these issues.

The best practice is to use S3 as a supplementary solution to Hadoop in order to bring life cycle management(expiration and tiering), and source/destination over the internet.

Thanks,
Takenori


On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Jay,
Thanks a lot for replying and it clarifies most of it, but still some parts are not so clear .
Some clarifications from my side :
| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.
:) i think i am not doing this basic mistake . What we have done is we have configured "viewfs://nsX" for "fs.defaultFS" and one of the mount is S3 i.e. "fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/".
So it fails to even create YARNRunner instance as there is no mapping for "fs.AbstractFileSystem.s3.impl" if run "./yarn jar". But as per the code even if set "fs.defaultFS" to s3 it will not work as there is no mapping for S3's impl of AbstractFileSystem interface.

These are my further queries

  1.  Whats the purpose of AbstractFileSystem and FileSystem interfaces?
  2.  Does HDFS default package(code) support configuration of S3 ? I see S3 implementation of FileSystem interface(org.apache.hadoop.fs.s3.S3FileSystem) but not for AbstractFileSystem !. So i presume it doesn't support S3 completely. Whats the reason for not supporting both ?
  3.  Suppose if i need to support Amazon S3 do i need to extend and implement AbstractFileSystem and configure  "fs.AbstractFileSystem.s3.impl" or some thing more than this i need to take care?

Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com


________________________________
From: jay vyas [jayunit100.apache@gmail.com<ma...@gmail.com>]
Sent: Saturday, September 27, 2014 02:41
To: common-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re:

See https://wiki.apache.org/hadoop/HCFS/

YES Yarn is written to the FileSystem interface.  It works on S3FileSystem and GlusterFileSystem and any other HCFS.

We have run , and continue to run, the many tests in apache bigtop's test suite against our hadoop clusters running on alternative file system implementations,
and it works.

When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.

When you create a FileSystem object in hadoop, it reads the uri (i.e. "glusterfs:///") and then finds the file system binding in your core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).

So the URI must have a corresponding entry in the core-site.xml.

As a reference implementation, you can see https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml




On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi All,

I have following doubts on pluggable FileSystem and YARN
1. If all the implementations should extend FileSystem then why there is a parallel class AbstractFileSystem. which ViewFS extends ?
2. Is YARN supposed to run on any of the pluggable org.apache.hadoop.fs.FileSystem like s3 ?
if its suppose to run then when submitting a job in the client side  YARNRunner is calling FileContext.getFileContext(this.conf);
which is further calling FileContext.getAbstractFileSystem() which throws exception for S3.
So i am not able to run YARN job with ViewFS with S3 as mount. And based on the code even if i configure only S3 then also its going to fail.
3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
http://www.huawei.com




--
jay vyas

RE: Re: Regarding HDFS and YARN support for S3

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Takenori,
Thanks for replying but still seem not getting some concepts
I understand that we need to give "fs.AbstractFileSystem.s3.impl" if we want to submit job using "./yarn jar" with S3 HCFS configured.
But what i don't understand is why 2 interfaces (may be i am novice in HDFS and hence not able to completely correlate with jira's which you gave).
If you can brief the differences between FileSystem and AbstractFileSystem, It would be helpful.


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

From: Takenori Sato [tsato@cloudian.com]
Sent: Monday, September 29, 2014 07:29
To: user@hadoop.apache.org
Subject: Re: Re: Regarding HDFS and YARN support for S3

Hi,

You may want to check HADOOP-10400<https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of S3 filesystem fixed in 2.6.

The subclass of AbstractFileSystem was filed as HADOOP-10643<https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not included in HADOOP-10400 though I made a comment<https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>.

I suggest not to use S3 as defaultFS as commented in "Why you cannot use S3 as a replacement for HDFS<https://wiki.apache.org/hadoop/AmazonS3>" to avoid all sorts of these issues.

The best practice is to use S3 as a supplementary solution to Hadoop in order to bring life cycle management(expiration and tiering), and source/destination over the internet.

Thanks,
Takenori


On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Jay,
Thanks a lot for replying and it clarifies most of it, but still some parts are not so clear .
Some clarifications from my side :
| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.
:) i think i am not doing this basic mistake . What we have done is we have configured "viewfs://nsX" for "fs.defaultFS" and one of the mount is S3 i.e. "fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/".
So it fails to even create YARNRunner instance as there is no mapping for "fs.AbstractFileSystem.s3.impl" if run "./yarn jar". But as per the code even if set "fs.defaultFS" to s3 it will not work as there is no mapping for S3's impl of AbstractFileSystem interface.

These are my further queries

  1.  Whats the purpose of AbstractFileSystem and FileSystem interfaces?
  2.  Does HDFS default package(code) support configuration of S3 ? I see S3 implementation of FileSystem interface(org.apache.hadoop.fs.s3.S3FileSystem) but not for AbstractFileSystem !. So i presume it doesn't support S3 completely. Whats the reason for not supporting both ?
  3.  Suppose if i need to support Amazon S3 do i need to extend and implement AbstractFileSystem and configure  "fs.AbstractFileSystem.s3.impl" or some thing more than this i need to take care?

Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com


________________________________
From: jay vyas [jayunit100.apache@gmail.com<ma...@gmail.com>]
Sent: Saturday, September 27, 2014 02:41
To: common-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re:

See https://wiki.apache.org/hadoop/HCFS/

YES Yarn is written to the FileSystem interface.  It works on S3FileSystem and GlusterFileSystem and any other HCFS.

We have run , and continue to run, the many tests in apache bigtop's test suite against our hadoop clusters running on alternative file system implementations,
and it works.

When you say "HDFS does not support fs.AbstractFileSystem.s3.impl".... That is true.  If your file system is configured using HDFS, then s3 urls will not be used, ever.

When you create a FileSystem object in hadoop, it reads the uri (i.e. "glusterfs:///") and then finds the file system binding in your core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).

So the URI must have a corresponding entry in the core-site.xml.

As a reference implementation, you can see https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml




On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi All,

I have following doubts on pluggable FileSystem and YARN
1. If all the implementations should extend FileSystem then why there is a parallel class AbstractFileSystem. which ViewFS extends ?
2. Is YARN supposed to run on any of the pluggable org.apache.hadoop.fs.FileSystem like s3 ?
if its suppose to run then when submitting a job in the client side  YARNRunner is calling FileContext.getFileContext(this.conf);
which is further calling FileContext.getAbstractFileSystem() which throws exception for S3.
So i am not able to run YARN job with ViewFS with S3 as mount. And based on the code even if i configure only S3 then also its going to fail.
3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?


Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283<tel:%2B91%209980040283>
Email: naganarasimhagr@huawei.com<ma...@huawei.com>
Huawei Technologies Co., Ltd.
http://www.huawei.com




--
jay vyas