You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Stephen Watt <sw...@redhat.com> on 2013/05/24 01:52:34 UTC

[DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Hi Folks

Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817
 
My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 

It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258.

If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project.

Regards
Steve Watt

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Roman Shaposhnik <rv...@apache.org>.

On Fri, May 24, 2013 at 5:08 PM, Konstantin Shvachko
<sh...@gmail.com> wrote:
> Makes sense, Steve.
> There are a couple of guys here at WANdisco who will be interested in
> joining.

Bigtop will also be interested in participating. Please keep us posted.

Thanks,
Roman.

[DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Andrew Purtell <ap...@apache.org>.

There are teams at Intel who will be interested as well.

On May 29, 2013, at 8:49 PM, Roman Shaposhnik <rvs@apache.org <javascript:;>>
wrote:

> On Fri, May 24, 2013 at 5:08 PM, Konstantin Shvachko
> <shv.hadoop@gmail.com <javascript:;>> wrote:
>> Makes sense, Steve.
>> There are a couple of guys here at WANdisco who will be interested in
>> joining.
>
> Bigtop will also be interested in participating. Please keep us posted.
>
> Thanks,
> Roman.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Konstantin Shvachko <sh...@gmail.com>.

Makes sense, Steve.
There are a couple of guys here at WANdisco who will be interested in
joining.

Thanks,
--Konstantin

On Fri, May 24, 2013 at 10:15 AM, Milind Bhandarkar <
mbhandarkar@gopivotal.com> wrote:

> Thanks for the initiative, Steve.
>
> A few folks from Pivotal and our partners would be interested in joining
> the workshop/discussion.
>
> - milind
>
>
> ---
> Milind Bhandarkar
> Chief Scientist, Machine Learning Platforms,
> Pivotal
> +1-650-523-3858 (W)
> +1-408-666-8483 (C)
>
>
> On Thu, May 23, 2013 at 4:52 PM, Stephen Watt <sw...@redhat.com> wrote:
>
> > Hi Folks
> >
> > Hadoop's pluggable filesystem architecture supports the ability to enable
> > an alternate filesystem for use with Hadoop by writing a plugin for it.
> We
> > now have several alternate filesystems that have Hadoop FileSystem
> plugins
> > and because this isn't a very well understood topic, I've been working
> on a
> > page on the project wiki to bring this all together -
> > http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> > has been opening up Ambari to support any configured Hadoop FileSystem
> (as
> > opposed to just HDFS) over at
> > https://issues.apache.org/jira/browse/AMBARI-1817
> >
> > My team (over at Red Hat) have been working on writing a Hadoop
> FileSystem
> > plugin for the glusterfs filesystem and have been finding that some of
> the
> > expected semantics of the operations within the Abstract FileSystem class
> > are a little ambiguous. With that said, we've joined Steve Loughran in
> > attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> > FileSystem class over at
> https://issues.apache.org/jira/browse/HADOOP-9371
> >
> > It seems to me that once we had these semantics defined, it would be good
> > for consistency of implementation if we could make sure they are well
> > understood and properly implemented by the community of folks writing
> > Hadoop FileSystem plugins. To that end, we might work to ensure that
> those
> > semantics are tested within an exhaustive test framework that focuses on
> > the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> > the tests to ensure their plugin implementation and behavior is
> consistent
> > with the expectation. Perhaps a broader extension of
> > https://issues.apache.org/jira/browse/HADOOP-9258.
> >
> > If folks are interested in these goals, I could host a
> > workshop/discussion/hackday in Mountain View to get local people together
> > (perhaps a Google Hangout for the remote folks) to keep the ball rolling
> on
> > the semantics discussion and test creation. As a side note, I think this
> > could also turn out be quite an effective means of introducing FileSystem
> > vendors to the ASF and getting them contributing to these aspects of the
> > project.
> >
> > Regards
> > Steve Watt
> >
>

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Milind Bhandarkar <mb...@gopivotal.com>.

Thanks for the initiative, Steve.

A few folks from Pivotal and our partners would be interested in joining
the workshop/discussion.

- milind


---
Milind Bhandarkar
Chief Scientist, Machine Learning Platforms,
Pivotal
+1-650-523-3858 (W)
+1-408-666-8483 (C)


On Thu, May 23, 2013 at 4:52 PM, Stephen Watt <sw...@redhat.com> wrote:

> Hi Folks
>
> Hadoop's pluggable filesystem architecture supports the ability to enable
> an alternate filesystem for use with Hadoop by writing a plugin for it. We
> now have several alternate filesystems that have Hadoop FileSystem plugins
> and because this isn't a very well understood topic, I've been working on a
> page on the project wiki to bring this all together -
> http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> has been opening up Ambari to support any configured Hadoop FileSystem (as
> opposed to just HDFS) over at
> https://issues.apache.org/jira/browse/AMBARI-1817
>
> My team (over at Red Hat) have been working on writing a Hadoop FileSystem
> plugin for the glusterfs filesystem and have been finding that some of the
> expected semantics of the operations within the Abstract FileSystem class
> are a little ambiguous. With that said, we've joined Steve Loughran in
> attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371
>
> It seems to me that once we had these semantics defined, it would be good
> for consistency of implementation if we could make sure they are well
> understood and properly implemented by the community of folks writing
> Hadoop FileSystem plugins. To that end, we might work to ensure that those
> semantics are tested within an exhaustive test framework that focuses on
> the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> the tests to ensure their plugin implementation and behavior is consistent
> with the expectation. Perhaps a broader extension of
> https://issues.apache.org/jira/browse/HADOOP-9258.
>
> If folks are interested in these goals, I could host a
> workshop/discussion/hackday in Mountain View to get local people together
> (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
> the semantics discussion and test creation. As a side note, I think this
> could also turn out be quite an effective means of introducing FileSystem
> vendors to the ASF and getting them contributing to these aspects of the
> project.
>
> Regards
> Steve Watt
>

Hadoop FileSystem Validation Workshop/Meetup - Red Hat in Mountain View on June 25th

Posted by Stephen Watt <sw...@redhat.com>.

Hi Folks

For those of you interested, the day before the Hadoop Summit we have a face to face workshop/meetup on Hadoop FileSystem Validation at Red Hat in Mountain View on June 25th from 10am - 3pm (lunch provided). 

I need to make sure you all get visitor passes, and also to avoid exceeding the room capacity, so please sign up here - http://hadoop-fs.eventbrite.com/

Regards
Steve Watt

----- Original Message -----
From: "Andrew Wang" <an...@cloudera.com>
To: common-dev@hadoop.apache.org
Cc: "Milind Bhandarkar" <mb...@gopivotal.com>, "shv hadoop" <sh...@gmail.com>, "Steve Loughran" <st...@hortonworks.com>, "Kun Ling" <er...@gmail.com>, "Roman Shaposhnik" <sh...@gmail.com>, "Andrew Purtell" <ap...@apache.org>, cdouglas@apache.org, jayhawk@cs.ucsc.edu, "Sanjay Radia" <sa...@hortonworks.com>
Sent: Friday, June 14, 2013 1:32:38 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Hey Steve,

I agree that it's confusing. FileSystem and FileContext are essentially two
parallel sets of interfaces for accessing filesystems in Hadoop.
FileContext splits the interface and shared code with AbstractFileSystem,
while FileSystem is all-in-one. If you're looking for the AFS equivalents
to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.

Realistically, FileSystem isn't going to be deprecated and removed any time
soon. There are lots of 3rd-party FileSystem implementations, and most apps
today use FileSystem (including many HDFS internals, like trash and the
shell).

When I read the wiki page, I figured that the mention of AFS was
essentially a typo, since everyone's been steaming ahead with FileSystem.
Standardizing FileSystem makes total sense to me, I just wanted to confirm
that plan.

Best,
Andrew


On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <sw...@redhat.com> wrote:

> This is a good point Andrew. The hangout was actually the first time I'd
> heard about the AbstractFileSystem class. I've been doing some further
> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
> implementation of DistributedFileSystem and LocalFileSystem class they
> extend the FileSystem class and not AbstractFileSystem. I would imagine if
> the plan for Hadoop 2.0 is to build FileSystem implementations using the
> AbstractFileSystem, then those two would use it, so I'm a bit confused.
>
> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
> clarify this for us?
>
> Regards
> Steve Watt
>
> ----- Original Message -----
> From: "Andrew Wang" <an...@cloudera.com>
> To: common-dev@hadoop.apache.org
> Cc: mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>,
> stevel@hortonworks.com, erlv5241@gmail.com, shaposhnik@gmail.com,
> apurtell@apache.org, cdouglas@apache.org, jayhawk@cs.ucsc.edu,
> sanjay@hortonworks.com
> Sent: Monday, June 10, 2013 5:14:16 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
> FileSystems + Workshop
>
> Thanks for the summary Steve, very useful.
>
> I'm wondering a bit about the point on testing AbstractFileSystem rather
> than FileSystem. While these are both wrappers for DFSClient, they're
> pretty different in terms of the APIs they expose. Furthermore, AFS is not
> actually a client-facing API; clients interact with an AFS through
> FileContext.
>
> I ask because I did some work trying to unify the symlink tests for both
> FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
> like the default mkdir semantics are different; you can see some of the
> contortions in HADOOP-9370. I ultimately ended up just adhering to the
> FileContext-style behavior, but as a result I'm not really testing some
> parts of FileSystem.
>
> Are we going to end up with two different sets of validation tests? Or just
> choose one API over the other? FileSystem is supposed to eventually be
> deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
> uptake in practice has been slow.
>
> Best,
> Andrew
>
>
> On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:
>
> > For those interested - I posted a recap of this mornings Google Hangout
> on
> > the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
> >
> > On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
> >
> > > Hi Folks
> > >
> > > Per Roman's recommendation I've created a Wiki Page for organizing the
> > work and managing the logistics -
> > https://wiki.apache.org/hadoop/HCFS/Progress
> > >
> > > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
> > get together and discuss the initiative. Please respond back to me if
> > you're interested or would like to propose a different time. I'll update
> > our Wiki page with the logistics.
> > >
> > > Regards
> > > Steve Watt
> > >
> > > ----- Original Message -----
> > > From: "Roman Shaposhnik" <sh...@gmail.com>
> > > To: "Stephen Watt" <sw...@redhat.com>
> > > Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv
> > hadoop" <sh...@gmail.com>, stevel@hortonworks.com,
> erlv5241@gmail.com,
> > apurtell@apache.org
> > > Sent: Friday, May 31, 2013 5:28:58 PM
> > > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
> > Hadoop FileSystems + Workshop
> > >
> > > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com>
> wrote:
> > >> What is the protocol for organizing the logistics and collaborating? I
> > am loathe to flood common-dev with "does this time work for you?" emails
> > from the interested parties. Do we create a high level JIRA ticket and
> > collaborate and post comments and G+ meetup times on that ? Another
> option
> > might be the Wiki, I'd be happy to be responsible with tracking progress
> on
> > https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
> > initiatives down into more granular JIRA tickets.
> > >
> > > I'd go with a wiki page and perhaps http://www.doodle.com/
> > >
> > >> After we've had a few G+ hangouts, for those that would like to meet
> > face to face, I have also made an all day reservation for a meeting room
> > that can hold up to 20 people at our Red Hat Office in Castro Street,
> > Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
> > short drive away). We don't have to use the whole day, but it gives us
> some
> > flexibility around the availability of interested parties. I was thinking
> > something along the lines of 10am - 3pm. We are happy to cater lunch.
> > >
> > > That also would be very much appreciated!
> > >
> > > Thanks,
> > > Roman.
> >
>

Re: Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?

Posted by Eli Collins <el...@cloudera.com>.

Hey Steve,

That's correct, see HADOOP-6223 for the history.  However, per Andrew
I don't think it's realistic to expect people to migrate off
FileSystem for a while (I filed HADOOP-6446 well over three years
ago).

The unfortunate consequence of the earlier decision to have parallel
interfaces rather than transition one over time means people
effectively need to end up implementing multiple backends - one that
gets used by clients of FileSystem, and one for clients of
FileContext.  Implementing in only one place significantly limits
adoption of the feature or file system because they can't be
effectively adopted in practice unless they're available to old and
new clients  (for example, this is why symlinks are getting backported
to FileSystem from FileContext).

Thanks,
Eli

On Tue, Jun 18, 2013 at 11:15 AM, Stephen Watt <sw...@redhat.com> wrote:
> Hi Folks
>
> My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is now the strategic class to extend for writing Hadoop FileSystem plugins. This is a departure from previous versions where one would extend the FileSystem class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 in the Apache Wiki (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml) which shows fs.AbstractFileSystem.hdfs.impl being set to org.apache.hadoop.fs.Hdfs
>
> Is my assertion correct? Do we have community consensus around this? i.e. Beyond the apache distro, are the commercial distros (Intel, Hortonworks, Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as their filesystem plugin for HDFS? What does one lose by using the DistributedFileSystem class instead of the Hdfs class?
>
> Regards
> Steve Watt
>
> ----- Original Message -----
> From: "Andrew Wang" <an...@cloudera.com>
> To: common-dev@hadoop.apache.org
> Cc: "Milind Bhandarkar" <mb...@gopivotal.com>, "shv hadoop" <sh...@gmail.com>, "Steve Loughran" <st...@hortonworks.com>, "Kun Ling" <er...@gmail.com>, "Roman Shaposhnik" <sh...@gmail.com>, "Andrew Purtell" <ap...@apache.org>, cdouglas@apache.org, jayhawk@cs.ucsc.edu, "Sanjay Radia" <sa...@hortonworks.com>
> Sent: Friday, June 14, 2013 1:32:38 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
>
> Hey Steve,
>
> I agree that it's confusing. FileSystem and FileContext are essentially two
> parallel sets of interfaces for accessing filesystems in Hadoop.
> FileContext splits the interface and shared code with AbstractFileSystem,
> while FileSystem is all-in-one. If you're looking for the AFS equivalents
> to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.
>
> Realistically, FileSystem isn't going to be deprecated and removed any time
> soon. There are lots of 3rd-party FileSystem implementations, and most apps
> today use FileSystem (including many HDFS internals, like trash and the
> shell).
>
> When I read the wiki page, I figured that the mention of AFS was
> essentially a typo, since everyone's been steaming ahead with FileSystem.
> Standardizing FileSystem makes total sense to me, I just wanted to confirm
> that plan.
>
> Best,
> Andrew
>
>
> On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <sw...@redhat.com> wrote:
>
>> This is a good point Andrew. The hangout was actually the first time I'd
>> heard about the AbstractFileSystem class. I've been doing some further
>> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
>> implementation of DistributedFileSystem and LocalFileSystem class they
>> extend the FileSystem class and not AbstractFileSystem. I would imagine if
>> the plan for Hadoop 2.0 is to build FileSystem implementations using the
>> AbstractFileSystem, then those two would use it, so I'm a bit confused.
>>
>> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
>> clarify this for us?
>>
>> Regards
>> Steve Watt
>>
>> ----- Original Message -----
>> From: "Andrew Wang" <an...@cloudera.com>
>> To: common-dev@hadoop.apache.org
>> Cc: mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>,
>> stevel@hortonworks.com, erlv5241@gmail.com, shaposhnik@gmail.com,
>> apurtell@apache.org, cdouglas@apache.org, jayhawk@cs.ucsc.edu,
>> sanjay@hortonworks.com
>> Sent: Monday, June 10, 2013 5:14:16 PM
>> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
>> FileSystems + Workshop
>>
>> Thanks for the summary Steve, very useful.
>>
>> I'm wondering a bit about the point on testing AbstractFileSystem rather
>> than FileSystem. While these are both wrappers for DFSClient, they're
>> pretty different in terms of the APIs they expose. Furthermore, AFS is not
>> actually a client-facing API; clients interact with an AFS through
>> FileContext.
>>
>> I ask because I did some work trying to unify the symlink tests for both
>> FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
>> like the default mkdir semantics are different; you can see some of the
>> contortions in HADOOP-9370. I ultimately ended up just adhering to the
>> FileContext-style behavior, but as a result I'm not really testing some
>> parts of FileSystem.
>>
>> Are we going to end up with two different sets of validation tests? Or just
>> choose one API over the other? FileSystem is supposed to eventually be
>> deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
>> uptake in practice has been slow.
>>
>> Best,
>> Andrew
>>
>>
>> On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:
>>
>> > For those interested - I posted a recap of this mornings Google Hangout
>> on
>> > the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
>> >
>> > On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
>> >
>> > > Hi Folks
>> > >
>> > > Per Roman's recommendation I've created a Wiki Page for organizing the
>> > work and managing the logistics -
>> > https://wiki.apache.org/hadoop/HCFS/Progress
>> > >
>> > > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
>> > get together and discuss the initiative. Please respond back to me if
>> > you're interested or would like to propose a different time. I'll update
>> > our Wiki page with the logistics.
>> > >
>> > > Regards
>> > > Steve Watt
>> > >
>> > > ----- Original Message -----
>> > > From: "Roman Shaposhnik" <sh...@gmail.com>
>> > > To: "Stephen Watt" <sw...@redhat.com>
>> > > Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv
>> > hadoop" <sh...@gmail.com>, stevel@hortonworks.com,
>> erlv5241@gmail.com,
>> > apurtell@apache.org
>> > > Sent: Friday, May 31, 2013 5:28:58 PM
>> > > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
>> > Hadoop FileSystems + Workshop
>> > >
>> > > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com>
>> wrote:
>> > >> What is the protocol for organizing the logistics and collaborating? I
>> > am loathe to flood common-dev with "does this time work for you?" emails
>> > from the interested parties. Do we create a high level JIRA ticket and
>> > collaborate and post comments and G+ meetup times on that ? Another
>> option
>> > might be the Wiki, I'd be happy to be responsible with tracking progress
>> on
>> > https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
>> > initiatives down into more granular JIRA tickets.
>> > >
>> > > I'd go with a wiki page and perhaps http://www.doodle.com/
>> > >
>> > >> After we've had a few G+ hangouts, for those that would like to meet
>> > face to face, I have also made an all day reservation for a meeting room
>> > that can hold up to 20 people at our Red Hat Office in Castro Street,
>> > Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
>> > short drive away). We don't have to use the whole day, but it gives us
>> some
>> > flexibility around the availability of interested parties. I was thinking
>> > something along the lines of 10am - 3pm. We are happy to cater lunch.
>> > >
>> > > That also would be very much appreciated!
>> > >
>> > > Thanks,
>> > > Roman.
>> >
>>

Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?

Posted by Stephen Watt <sw...@redhat.com>.

Hi Folks

My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is now the strategic class to extend for writing Hadoop FileSystem plugins. This is a departure from previous versions where one would extend the FileSystem class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 in the Apache Wiki (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml) which shows fs.AbstractFileSystem.hdfs.impl being set to org.apache.hadoop.fs.Hdfs 

Is my assertion correct? Do we have community consensus around this? i.e. Beyond the apache distro, are the commercial distros (Intel, Hortonworks, Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as their filesystem plugin for HDFS? What does one lose by using the DistributedFileSystem class instead of the Hdfs class?

Regards
Steve Watt

----- Original Message -----
From: "Andrew Wang" <an...@cloudera.com>
To: common-dev@hadoop.apache.org
Cc: "Milind Bhandarkar" <mb...@gopivotal.com>, "shv hadoop" <sh...@gmail.com>, "Steve Loughran" <st...@hortonworks.com>, "Kun Ling" <er...@gmail.com>, "Roman Shaposhnik" <sh...@gmail.com>, "Andrew Purtell" <ap...@apache.org>, cdouglas@apache.org, jayhawk@cs.ucsc.edu, "Sanjay Radia" <sa...@hortonworks.com>
Sent: Friday, June 14, 2013 1:32:38 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Hey Steve,

I agree that it's confusing. FileSystem and FileContext are essentially two
parallel sets of interfaces for accessing filesystems in Hadoop.
FileContext splits the interface and shared code with AbstractFileSystem,
while FileSystem is all-in-one. If you're looking for the AFS equivalents
to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.

Realistically, FileSystem isn't going to be deprecated and removed any time
soon. There are lots of 3rd-party FileSystem implementations, and most apps
today use FileSystem (including many HDFS internals, like trash and the
shell).

When I read the wiki page, I figured that the mention of AFS was
essentially a typo, since everyone's been steaming ahead with FileSystem.
Standardizing FileSystem makes total sense to me, I just wanted to confirm
that plan.

Best,
Andrew


On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <sw...@redhat.com> wrote:

> This is a good point Andrew. The hangout was actually the first time I'd
> heard about the AbstractFileSystem class. I've been doing some further
> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
> implementation of DistributedFileSystem and LocalFileSystem class they
> extend the FileSystem class and not AbstractFileSystem. I would imagine if
> the plan for Hadoop 2.0 is to build FileSystem implementations using the
> AbstractFileSystem, then those two would use it, so I'm a bit confused.
>
> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
> clarify this for us?
>
> Regards
> Steve Watt
>
> ----- Original Message -----
> From: "Andrew Wang" <an...@cloudera.com>
> To: common-dev@hadoop.apache.org
> Cc: mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>,
> stevel@hortonworks.com, erlv5241@gmail.com, shaposhnik@gmail.com,
> apurtell@apache.org, cdouglas@apache.org, jayhawk@cs.ucsc.edu,
> sanjay@hortonworks.com
> Sent: Monday, June 10, 2013 5:14:16 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
> FileSystems + Workshop
>
> Thanks for the summary Steve, very useful.
>
> I'm wondering a bit about the point on testing AbstractFileSystem rather
> than FileSystem. While these are both wrappers for DFSClient, they're
> pretty different in terms of the APIs they expose. Furthermore, AFS is not
> actually a client-facing API; clients interact with an AFS through
> FileContext.
>
> I ask because I did some work trying to unify the symlink tests for both
> FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
> like the default mkdir semantics are different; you can see some of the
> contortions in HADOOP-9370. I ultimately ended up just adhering to the
> FileContext-style behavior, but as a result I'm not really testing some
> parts of FileSystem.
>
> Are we going to end up with two different sets of validation tests? Or just
> choose one API over the other? FileSystem is supposed to eventually be
> deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
> uptake in practice has been slow.
>
> Best,
> Andrew
>
>
> On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:
>
> > For those interested - I posted a recap of this mornings Google Hangout
> on
> > the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
> >
> > On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
> >
> > > Hi Folks
> > >
> > > Per Roman's recommendation I've created a Wiki Page for organizing the
> > work and managing the logistics -
> > https://wiki.apache.org/hadoop/HCFS/Progress
> > >
> > > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
> > get together and discuss the initiative. Please respond back to me if
> > you're interested or would like to propose a different time. I'll update
> > our Wiki page with the logistics.
> > >
> > > Regards
> > > Steve Watt
> > >
> > > ----- Original Message -----
> > > From: "Roman Shaposhnik" <sh...@gmail.com>
> > > To: "Stephen Watt" <sw...@redhat.com>
> > > Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv
> > hadoop" <sh...@gmail.com>, stevel@hortonworks.com,
> erlv5241@gmail.com,
> > apurtell@apache.org
> > > Sent: Friday, May 31, 2013 5:28:58 PM
> > > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
> > Hadoop FileSystems + Workshop
> > >
> > > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com>
> wrote:
> > >> What is the protocol for organizing the logistics and collaborating? I
> > am loathe to flood common-dev with "does this time work for you?" emails
> > from the interested parties. Do we create a high level JIRA ticket and
> > collaborate and post comments and G+ meetup times on that ? Another
> option
> > might be the Wiki, I'd be happy to be responsible with tracking progress
> on
> > https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
> > initiatives down into more granular JIRA tickets.
> > >
> > > I'd go with a wiki page and perhaps http://www.doodle.com/
> > >
> > >> After we've had a few G+ hangouts, for those that would like to meet
> > face to face, I have also made an all day reservation for a meeting room
> > that can hold up to 20 people at our Red Hat Office in Castro Street,
> > Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
> > short drive away). We don't have to use the whole day, but it gives us
> some
> > flexibility around the availability of interested parties. I was thinking
> > something along the lines of 10am - 3pm. We are happy to cater lunch.
> > >
> > > That also would be very much appreciated!
> > >
> > > Thanks,
> > > Roman.
> >
>

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Andrew Wang <an...@cloudera.com>.

Hey Steve,

I agree that it's confusing. FileSystem and FileContext are essentially two
parallel sets of interfaces for accessing filesystems in Hadoop.
FileContext splits the interface and shared code with AbstractFileSystem,
while FileSystem is all-in-one. If you're looking for the AFS equivalents
to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.

Realistically, FileSystem isn't going to be deprecated and removed any time
soon. There are lots of 3rd-party FileSystem implementations, and most apps
today use FileSystem (including many HDFS internals, like trash and the
shell).

When I read the wiki page, I figured that the mention of AFS was
essentially a typo, since everyone's been steaming ahead with FileSystem.
Standardizing FileSystem makes total sense to me, I just wanted to confirm
that plan.

Best,
Andrew


On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <sw...@redhat.com> wrote:

> This is a good point Andrew. The hangout was actually the first time I'd
> heard about the AbstractFileSystem class. I've been doing some further
> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
> implementation of DistributedFileSystem and LocalFileSystem class they
> extend the FileSystem class and not AbstractFileSystem. I would imagine if
> the plan for Hadoop 2.0 is to build FileSystem implementations using the
> AbstractFileSystem, then those two would use it, so I'm a bit confused.
>
> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
> clarify this for us?
>
> Regards
> Steve Watt
>
> ----- Original Message -----
> From: "Andrew Wang" <an...@cloudera.com>
> To: common-dev@hadoop.apache.org
> Cc: mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>,
> stevel@hortonworks.com, erlv5241@gmail.com, shaposhnik@gmail.com,
> apurtell@apache.org, cdouglas@apache.org, jayhawk@cs.ucsc.edu,
> sanjay@hortonworks.com
> Sent: Monday, June 10, 2013 5:14:16 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
> FileSystems + Workshop
>
> Thanks for the summary Steve, very useful.
>
> I'm wondering a bit about the point on testing AbstractFileSystem rather
> than FileSystem. While these are both wrappers for DFSClient, they're
> pretty different in terms of the APIs they expose. Furthermore, AFS is not
> actually a client-facing API; clients interact with an AFS through
> FileContext.
>
> I ask because I did some work trying to unify the symlink tests for both
> FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
> like the default mkdir semantics are different; you can see some of the
> contortions in HADOOP-9370. I ultimately ended up just adhering to the
> FileContext-style behavior, but as a result I'm not really testing some
> parts of FileSystem.
>
> Are we going to end up with two different sets of validation tests? Or just
> choose one API over the other? FileSystem is supposed to eventually be
> deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
> uptake in practice has been slow.
>
> Best,
> Andrew
>
>
> On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:
>
> > For those interested - I posted a recap of this mornings Google Hangout
> on
> > the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
> >
> > On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
> >
> > > Hi Folks
> > >
> > > Per Roman's recommendation I've created a Wiki Page for organizing the
> > work and managing the logistics -
> > https://wiki.apache.org/hadoop/HCFS/Progress
> > >
> > > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
> > get together and discuss the initiative. Please respond back to me if
> > you're interested or would like to propose a different time. I'll update
> > our Wiki page with the logistics.
> > >
> > > Regards
> > > Steve Watt
> > >
> > > ----- Original Message -----
> > > From: "Roman Shaposhnik" <sh...@gmail.com>
> > > To: "Stephen Watt" <sw...@redhat.com>
> > > Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv
> > hadoop" <sh...@gmail.com>, stevel@hortonworks.com,
> erlv5241@gmail.com,
> > apurtell@apache.org
> > > Sent: Friday, May 31, 2013 5:28:58 PM
> > > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
> > Hadoop FileSystems + Workshop
> > >
> > > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com>
> wrote:
> > >> What is the protocol for organizing the logistics and collaborating? I
> > am loathe to flood common-dev with "does this time work for you?" emails
> > from the interested parties. Do we create a high level JIRA ticket and
> > collaborate and post comments and G+ meetup times on that ? Another
> option
> > might be the Wiki, I'd be happy to be responsible with tracking progress
> on
> > https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
> > initiatives down into more granular JIRA tickets.
> > >
> > > I'd go with a wiki page and perhaps http://www.doodle.com/
> > >
> > >> After we've had a few G+ hangouts, for those that would like to meet
> > face to face, I have also made an all day reservation for a meeting room
> > that can hold up to 20 people at our Red Hat Office in Castro Street,
> > Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
> > short drive away). We don't have to use the whole day, but it gives us
> some
> > flexibility around the availability of interested parties. I was thinking
> > something along the lines of 10am - 3pm. We are happy to cater lunch.
> > >
> > > That also would be very much appreciated!
> > >
> > > Thanks,
> > > Roman.
> >
>

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Stephen Watt <sw...@redhat.com>.

This is a good point Andrew. The hangout was actually the first time I'd heard about the AbstractFileSystem class. I've been doing some further analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0 implementation of DistributedFileSystem and LocalFileSystem class they extend the FileSystem class and not AbstractFileSystem. I would imagine if the plan for Hadoop 2.0 is to build FileSystem implementations using the AbstractFileSystem, then those two would use it, so I'm a bit confused.

Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you clarify this for us?

Regards
Steve Watt

----- Original Message -----
From: "Andrew Wang" <an...@cloudera.com>
To: common-dev@hadoop.apache.org
Cc: mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com, shaposhnik@gmail.com, apurtell@apache.org, cdouglas@apache.org, jayhawk@cs.ucsc.edu, sanjay@hortonworks.com
Sent: Monday, June 10, 2013 5:14:16 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Thanks for the summary Steve, very useful.

I'm wondering a bit about the point on testing AbstractFileSystem rather
than FileSystem. While these are both wrappers for DFSClient, they're
pretty different in terms of the APIs they expose. Furthermore, AFS is not
actually a client-facing API; clients interact with an AFS through
FileContext.

I ask because I did some work trying to unify the symlink tests for both
FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
like the default mkdir semantics are different; you can see some of the
contortions in HADOOP-9370. I ultimately ended up just adhering to the
FileContext-style behavior, but as a result I'm not really testing some
parts of FileSystem.

Are we going to end up with two different sets of validation tests? Or just
choose one API over the other? FileSystem is supposed to eventually be
deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
uptake in practice has been slow.

Best,
Andrew

On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:

> For those interested - I posted a recap of this mornings Google Hangout on
> the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
>
> On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
>
> > Hi Folks
> >
> > Per Roman's recommendation I've created a Wiki Page for organizing the
> work and managing the logistics -
> https://wiki.apache.org/hadoop/HCFS/Progress
> >
> > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
> get together and discuss the initiative. Please respond back to me if
> you're interested or would like to propose a different time. I'll update
> our Wiki page with the logistics.
> >
> > Regards
> > Steve Watt
> >
> > ----- Original Message -----
> > From: "Roman Shaposhnik" <sh...@gmail.com>
> > To: "Stephen Watt" <sw...@redhat.com>
> > Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv
> hadoop" <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com,
> apurtell@apache.org
> > Sent: Friday, May 31, 2013 5:28:58 PM
> > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
> Hadoop FileSystems + Workshop
> >
> > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
> >> What is the protocol for organizing the logistics and collaborating? I
> am loathe to flood common-dev with "does this time work for you?" emails
> from the interested parties. Do we create a high level JIRA ticket and
> collaborate and post comments and G+ meetup times on that ? Another option
> might be the Wiki, I'd be happy to be responsible with tracking progress on
> https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
> initiatives down into more granular JIRA tickets.
> >
> > I'd go with a wiki page and perhaps http://www.doodle.com/
> >
> >> After we've had a few G+ hangouts, for those that would like to meet
> face to face, I have also made an all day reservation for a meeting room
> that can hold up to 20 people at our Red Hat Office in Castro Street,
> Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
> short drive away). We don't have to use the whole day, but it gives us some
> flexibility around the availability of interested parties. I was thinking
> something along the lines of 10am - 3pm. We are happy to cater lunch.
> >
> > That also would be very much appreciated!
> >
> > Thanks,
> > Roman.
>

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Andrew Wang <an...@cloudera.com>.

Thanks for the summary Steve, very useful.

I'm wondering a bit about the point on testing AbstractFileSystem rather
than FileSystem. While these are both wrappers for DFSClient, they're
pretty different in terms of the APIs they expose. Furthermore, AFS is not
actually a client-facing API; clients interact with an AFS through
FileContext.

I ask because I did some work trying to unify the symlink tests for both
FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
like the default mkdir semantics are different; you can see some of the
contortions in HADOOP-9370. I ultimately ended up just adhering to the
FileContext-style behavior, but as a result I'm not really testing some
parts of FileSystem.

Are we going to end up with two different sets of validation tests? Or just
choose one API over the other? FileSystem is supposed to eventually be
deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
uptake in practice has been slow.

Best,
Andrew

On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:

> For those interested - I posted a recap of this mornings Google Hangout on
> the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
>
> On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
>
> > Hi Folks
> >
> > Per Roman's recommendation I've created a Wiki Page for organizing the
> work and managing the logistics -
> https://wiki.apache.org/hadoop/HCFS/Progress
> >
> > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
> get together and discuss the initiative. Please respond back to me if
> you're interested or would like to propose a different time. I'll update
> our Wiki page with the logistics.
> >
> > Regards
> > Steve Watt
> >
> > ----- Original Message -----
> > From: "Roman Shaposhnik" <sh...@gmail.com>
> > To: "Stephen Watt" <sw...@redhat.com>
> > Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv
> hadoop" <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com,
> apurtell@apache.org
> > Sent: Friday, May 31, 2013 5:28:58 PM
> > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
> Hadoop FileSystems + Workshop
> >
> > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
> >> What is the protocol for organizing the logistics and collaborating? I
> am loathe to flood common-dev with "does this time work for you?" emails
> from the interested parties. Do we create a high level JIRA ticket and
> collaborate and post comments and G+ meetup times on that ? Another option
> might be the Wiki, I'd be happy to be responsible with tracking progress on
> https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
> initiatives down into more granular JIRA tickets.
> >
> > I'd go with a wiki page and perhaps http://www.doodle.com/
> >
> >> After we've had a few G+ hangouts, for those that would like to meet
> face to face, I have also made an all day reservation for a meeting room
> that can hold up to 20 people at our Red Hat Office in Castro Street,
> Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
> short drive away). We don't have to use the whole day, but it gives us some
> flexibility around the availability of interested parties. I was thinking
> something along the lines of 10am - 3pm. We are happy to cater lunch.
> >
> > That also would be very much appreciated!
> >
> > Thanks,
> > Roman.
>

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Stephen Watt <sw...@redhat.com>.

For those interested - I posted a recap of this mornings Google Hangout on the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress

On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:

> Hi Folks
> 
> Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress
> 
> I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics.
> 
> Regards
> Steve Watt
> 
> ----- Original Message -----
> From: "Roman Shaposhnik" <sh...@gmail.com>
> To: "Stephen Watt" <sw...@redhat.com>
> Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com, apurtell@apache.org
> Sent: Friday, May 31, 2013 5:28:58 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
> 
> On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
>> What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with "does this time work for you?" emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets.
> 
> I'd go with a wiki page and perhaps http://www.doodle.com/
> 
>> After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch.
> 
> That also would be very much appreciated!
> 
> Thanks,
> Roman.

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by sanjay Radia <sa...@hortonworks.com>.

I plan to attend.
A 9:30 time is a little better for me.

sanjay

On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:

> Hi Folks
> 
> Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress
> 
> I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics.
> 
> Regards
> Steve Watt
> 
> ----- Original Message -----
> From: "Roman Shaposhnik" <sh...@gmail.com>
> To: "Stephen Watt" <sw...@redhat.com>
> Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com, apurtell@apache.org
> Sent: Friday, May 31, 2013 5:28:58 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
> 
> On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
>> What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with "does this time work for you?" emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets.
> 
> I'd go with a wiki page and perhaps http://www.doodle.com/
> 
>> After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch.
> 
> That also would be very much appreciated!
> 
> Thanks,
> Roman.

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Andrew Purtell <ap...@apache.org>.

The proposed time (9am PST Monday June 10th) is good for me.


On Thu, Jun 6, 2013 at 5:14 AM, Stephen Watt <sw...@redhat.com> wrote:

> Hi Folks
>
> Per Roman's recommendation I've created a Wiki Page for organizing the
> work and managing the logistics -
> https://wiki.apache.org/hadoop/HCFS/Progress
>
> I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get
> together and discuss the initiative. Please respond back to me if you're
> interested or would like to propose a different time. I'll update our Wiki
> page with the logistics.
>
> Regards
> Steve Watt
>
> ----- Original Message -----
> From: "Roman Shaposhnik" <sh...@gmail.com>
> To: "Stephen Watt" <sw...@redhat.com>
> Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv hadoop"
> <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com,
> apurtell@apache.org
> Sent: Friday, May 31, 2013 5:28:58 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
> FileSystems + Workshop
>
> On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
> > What is the protocol for organizing the logistics and collaborating? I
> am loathe to flood common-dev with "does this time work for you?" emails
> from the interested parties. Do we create a high level JIRA ticket and
> collaborate and post comments and G+ meetup times on that ? Another option
> might be the Wiki, I'd be happy to be responsible with tracking progress on
> https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
> initiatives down into more granular JIRA tickets.
>
> I'd go with a wiki page and perhaps http://www.doodle.com/
>
> > After we've had a few G+ hangouts, for those that would like to meet
> face to face, I have also made an all day reservation for a meeting room
> that can hold up to 20 people at our Red Hat Office in Castro Street,
> Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
> short drive away). We don't have to use the whole day, but it gives us some
> flexibility around the availability of interested parties. I was thinking
> something along the lines of 10am - 3pm. We are happy to cater lunch.
>
> That also would be very much appreciated!
>
> Thanks,
> Roman.
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Stephen Watt <sw...@redhat.com>.

Hi Folks

Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress

I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics.

Regards
Steve Watt

----- Original Message -----
From: "Roman Shaposhnik" <sh...@gmail.com>
To: "Stephen Watt" <sw...@redhat.com>
Cc: common-dev@hadoop.apache.org, mbhandarkar@gopivotal.com, "shv hadoop" <sh...@gmail.com>, stevel@hortonworks.com, erlv5241@gmail.com, apurtell@apache.org
Sent: Friday, May 31, 2013 5:28:58 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
> What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with "does this time work for you?" emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets.

I'd go with a wiki page and perhaps http://www.doodle.com/

> After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch.

That also would be very much appreciated!

Thanks,
Roman.

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Roman Shaposhnik <sh...@gmail.com>.

On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com> wrote:
> What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with "does this time work for you?" emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets.

I'd go with a wiki page and perhaps http://www.doodle.com/

> After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch.

That also would be very much appreciated!

Thanks,
Roman.

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Stephen Watt <sw...@redhat.com>.

Hi Folks

I am grateful for the interest and to get so many responses (interested parties that responded are on CC).

I like Steve Loughran's idea of having a few G+ hangouts first to get to some consensus on how to organize the work as well as hear his thoughts about leveraging the Hadoop FileSystem tests he's already developed for the SWIFT object store. I am also keen to present/discuss the work we've (Red Hat) done around our perception of the state of the art for filesystem semantics and their test coverage to validate if the community at least has a shared point of view, which I think would be a good starting point.

What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with "does this time work for you?" emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets.

After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. 

Regards
Steve Watt

----- Original Message -----
From: "Steve Loughran" <st...@hortonworks.com>
To: common-dev@hadoop.apache.org
Sent: Friday, May 24, 2013 3:47:04 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

On 24 May 2013 00:52, Stephen Watt <sw...@redhat.com> wrote:

> Hi Folks
>
> Hadoop's pluggable filesystem architecture supports the ability to enable
> an alternate filesystem for use with Hadoop by writing a plugin for it. We
> now have several alternate filesystems that have Hadoop FileSystem plugins
> and because this isn't a very well understood topic, I've been working on a
> page on the project wiki to bring this all together -
> http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> has been opening up Ambari to support any configured Hadoop FileSystem (as
> opposed to just HDFS) over at
> https://issues.apache.org/jira/browse/AMBARI-1817
>
> My team (over at Red Hat) have been working on writing a Hadoop FileSystem
> plugin for the glusterfs filesystem and have been finding that some of the
> expected semantics of the operations within the Abstract FileSystem class
> are a little ambiguous. With that said, we've joined Steve Loughran in
> attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371
>
> It seems to me that once we had these semantics defined, it would be good
> for consistency of implementation if we could make sure they are well
> understood and properly implemented by the community of folks writing
> Hadoop FileSystem plugins. To that end, we might work to ensure that those
> semantics are tested within an exhaustive test framework that focuses on
> the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> the tests to ensure their plugin implementation and behavior is consistent
> with the expectation. Perhaps a broader extension of
> https://issues.apache.org/jira/browse/HADOOP-9258.
>
>
I have a plan for starting those tests, pulling up the Swift ones when they
are checked in. Big tests that do scale, and that verify the assumptions
that MR, HBase &c are where we are weakest. The defacto definition of FS
sematics are the apps, and its them that currently find the problems (e.g
MAPREDUCE-5264)

> If folks are interested in these goals, I could host a
> workshop/discussion/hackday in Mountain View to get local people together
> (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
> the semantics discussion and test creation. As a side note, I think this
> could also turn out be quite an effective means of introducing FileSystem
> vendors to the ASF and getting them contributing to these aspects of the
> project.
>
>
Can we start with some G+ hangouts to get to know each other and have some
broader participation (myself, the others working on Swift, people who have
done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is
held, it's got some clearer objectives "how do we test this". I would want
the FS semantics to be locked down in some online discussions/JIRA rather
than come back after a night's sleep to discover it had be defined with
tests.

-steve

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 May 2013 00:52, Stephen Watt <sw...@redhat.com> wrote:

> Hi Folks
>
> Hadoop's pluggable filesystem architecture supports the ability to enable
> an alternate filesystem for use with Hadoop by writing a plugin for it. We
> now have several alternate filesystems that have Hadoop FileSystem plugins
> and because this isn't a very well understood topic, I've been working on a
> page on the project wiki to bring this all together -
> http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> has been opening up Ambari to support any configured Hadoop FileSystem (as
> opposed to just HDFS) over at
> https://issues.apache.org/jira/browse/AMBARI-1817
>
> My team (over at Red Hat) have been working on writing a Hadoop FileSystem
> plugin for the glusterfs filesystem and have been finding that some of the
> expected semantics of the operations within the Abstract FileSystem class
> are a little ambiguous. With that said, we've joined Steve Loughran in
> attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371
>
> It seems to me that once we had these semantics defined, it would be good
> for consistency of implementation if we could make sure they are well
> understood and properly implemented by the community of folks writing
> Hadoop FileSystem plugins. To that end, we might work to ensure that those
> semantics are tested within an exhaustive test framework that focuses on
> the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> the tests to ensure their plugin implementation and behavior is consistent
> with the expectation. Perhaps a broader extension of
> https://issues.apache.org/jira/browse/HADOOP-9258.
>
>
I have a plan for starting those tests, pulling up the Swift ones when they
are checked in. Big tests that do scale, and that verify the assumptions
that MR, HBase &c are where we are weakest. The defacto definition of FS
sematics are the apps, and its them that currently find the problems (e.g
MAPREDUCE-5264)


> If folks are interested in these goals, I could host a
> workshop/discussion/hackday in Mountain View to get local people together
> (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
> the semantics discussion and test creation. As a side note, I think this
> could also turn out be quite an effective means of introducing FileSystem
> vendors to the ASF and getting them contributing to these aspects of the
> project.
>
>
Can we start with some G+ hangouts to get to know each other and have some
broader participation (myself, the others working on Swift, people who have
done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is
held, it's got some clearer objectives "how do we test this". I would want
the FS semantics to be locked down in some online discussions/JIRA rather
than come back after a night's sleep to discover it had be defined with
tests.

-steve

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Brandon Li <br...@hortonworks.com>.

Hi Kun,

In case you are looking for the NFS support to HDFS,  this JIRA might
interest you: HDFS-4750.

Thanks,
Brandon Li


On Thu, May 23, 2013 at 6:43 PM, Kun Ling <lk...@gmail.com> wrote:

> Hi Stephen Watt,
>     I am a fresh  developer trying to add a NFS-like FileSystem support for
> Hadoop, and also have some confusion about the FileSystem Semantics.
>
>    Since I live  in East Asia, I'd like to attend via Google Hangout if
> possible.
>
>    Thanks.
>
>     +1 Kun Ling
>
>
> yours,
> Kun Ling
>
>
> On Fri, May 24, 2013 at 7:52 AM, Stephen Watt <sw...@redhat.com> wrote:
>
> > Hi Folks
> >
> > Hadoop's pluggable filesystem architecture supports the ability to enable
> > an alternate filesystem for use with Hadoop by writing a plugin for it.
> We
> > now have several alternate filesystems that have Hadoop FileSystem
> plugins
> > and because this isn't a very well understood topic, I've been working
> on a
> > page on the project wiki to bring this all together -
> > http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> > has been opening up Ambari to support any configured Hadoop FileSystem
> (as
> > opposed to just HDFS) over at
> > https://issues.apache.org/jira/browse/AMBARI-1817
> >
> > My team (over at Red Hat) have been working on writing a Hadoop
> FileSystem
> > plugin for the glusterfs filesystem and have been finding that some of
> the
> > expected semantics of the operations within the Abstract FileSystem class
> > are a little ambiguous. With that said, we've joined Steve Loughran in
> > attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> > FileSystem class over at
> https://issues.apache.org/jira/browse/HADOOP-9371
> >
> > It seems to me that once we had these semantics defined, it would be good
> > for consistency of implementation if we could make sure they are well
> > understood and properly implemented by the community of folks writing
> > Hadoop FileSystem plugins. To that end, we might work to ensure that
> those
> > semantics are tested within an exhaustive test framework that focuses on
> > the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> > the tests to ensure their plugin implementation and behavior is
> consistent
> > with the expectation. Perhaps a broader extension of
> > https://issues.apache.org/jira/browse/HADOOP-9258.
> >
> > If folks are interested in these goals, I could host a
> > workshop/discussion/hackday in Mountain View to get local people together
> > (perhaps a Google Hangout for the remote folks) to keep the ball rolling
> on
> > the semantics discussion and test creation. As a side note, I think this
> > could also turn out be quite an effective means of introducing FileSystem
> > vendors to the ASF and getting them contributing to these aspects of the
> > project.
> >
> > Regards
> > Steve Watt
> >
>
>
>
> --
> http://www.lingcc.com
>

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

Posted by Kun Ling <lk...@gmail.com>.

Hi Stephen Watt,
    I am a fresh  developer trying to add a NFS-like FileSystem support for
Hadoop, and also have some confusion about the FileSystem Semantics.

   Since I live  in East Asia, I'd like to attend via Google Hangout if
possible.

   Thanks.

    +1 Kun Ling


yours,
Kun Ling


On Fri, May 24, 2013 at 7:52 AM, Stephen Watt <sw...@redhat.com> wrote:

> Hi Folks
>
> Hadoop's pluggable filesystem architecture supports the ability to enable
> an alternate filesystem for use with Hadoop by writing a plugin for it. We
> now have several alternate filesystems that have Hadoop FileSystem plugins
> and because this isn't a very well understood topic, I've been working on a
> page on the project wiki to bring this all together -
> http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
> has been opening up Ambari to support any configured Hadoop FileSystem (as
> opposed to just HDFS) over at
> https://issues.apache.org/jira/browse/AMBARI-1817
>
> My team (over at Red Hat) have been working on writing a Hadoop FileSystem
> plugin for the glusterfs filesystem and have been finding that some of the
> expected semantics of the operations within the Abstract FileSystem class
> are a little ambiguous. With that said, we've joined Steve Loughran in
> attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
> FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371
>
> It seems to me that once we had these semantics defined, it would be good
> for consistency of implementation if we could make sure they are well
> understood and properly implemented by the community of folks writing
> Hadoop FileSystem plugins. To that end, we might work to ensure that those
> semantics are tested within an exhaustive test framework that focuses on
> the abstract Hadoop FileSystem layer. Each FileSystem provider could run
> the tests to ensure their plugin implementation and behavior is consistent
> with the expectation. Perhaps a broader extension of
> https://issues.apache.org/jira/browse/HADOOP-9258.
>
> If folks are interested in these goals, I could host a
> workshop/discussion/hackday in Mountain View to get local people together
> (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
> the semantics discussion and test creation. As a side note, I think this
> could also turn out be quite an effective means of introducing FileSystem
> vendors to the ASF and getting them contributing to these aspects of the
> project.
>
> Regards
> Steve Watt
>



-- 
http://www.lingcc.com