You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/05/21 16:19:18 UTC

[jira] [Created] (HADOOP-9583) test-patch gives +1 despite build failure when running tests

Jason Lowe created HADOOP-9583:
----------------------------------

             Summary: test-patch gives +1 despite build failure when running tests
                 Key: HADOOP-9583
                 URL: https://issues.apache.org/jira/browse/HADOOP-9583
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 3.0.0
            Reporter: Jason Lowe
            Priority: Critical


I've seen a couple of checkins recently where tests have timed out resulting in a Maven build failure yet test-patch reports an overall +1 on the patch.  This is encouraging commits of patches that subsequently break builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Where should we host Hadoop FileSystem plugins for 3rd Party FileSystems?

Posted by Tim St Clair <ts...@redhat.com>.

>From what I have seen, that is consistent with the modifications in the code.

E.g. - Removal of kfs, and qfs shim is now external on github.   

Cheers,
Tim

----- Original Message -----
> From: "Stephen Watt" <sw...@redhat.com>
> To: common-dev@hadoop.apache.org
> Sent: Tuesday, May 21, 2013 11:49:34 AM
> Subject: Where should we host Hadoop FileSystem plugins for 3rd Party FileSystems?
> 
> Hi Folks
> 
> My name is Steve Watt and I am presently working on enabling glusterfs to be
> used as a Hadoop FileSystem. Most of the work thus far has involved
> developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> point where the plugin is becoming stable and I've been trying to understand
> where the right place is to host/manage/version it.
> 
> Steve Loughran was kind enough to point out a few past threads in the
> community
> (http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html)
> that show a disposition to move away from Hadoop Common containing client
> code (plugins) for 3rd party FileSystems. This makes sense and allows the
> filesystem plugin developer more autonomy as well as reduces Hadoop Common's
> dependence on 3rd Party libraries. I'm easy either way. I just wanted to
> verify that the community's preference is still to have client code for 3rd
> Party FileSystems hosted and managed outside of Hadoop Common before I take
> that direction.
> 
> Regards
> Steve Watt
>

Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 May 2013 01:28, Colin McCabe <cm...@alumni.cmu.edu> wrote:

> You might try looking at what KosmoFS (KFS) did.  They have some code in
> org/apache/hadoop/fs which calls their own Java shim.
>
> This way, the shim code in hadoop-common gets updated whenever FileSystem
> changes, but there is no requirement to install KFS before building Hadoop.
>

actually we were backing away from bundling that in there, the main issue
being inability to regression tests; it was code coming out from the ASF
marked as "part of hadoop" but we never knew what it did.

The s3 blobstore is in there; I think at some point it would be good to
pull out from the core hadoop-common JAR and put into its own
hadoop-tools/hadoop-aws JAR/project, so that its dependencies (jetS3t)
would be isolated from the main project, keeping transitive pom bloat down,
and allowing it to be a separate installable item.

S3 & Swift are testable, you just need money and/or donated cluster time
from the service providers; the same would hold for google cloud storage,
etc. They are on the net and I can test them from my laptop, even though
latency and bandwidth surface there (and on some of the swift services,
throttling of side-effecting operations, such as a recursive delete of a v.
large directory. That remote testing, therefore, helps me find such pains
before it hits the fueld.

> You might also try asking Steve Loughran, since he did some great work
> recently to try to nail down the exact semantics of FileSystem and
> FileContext and improve the related unit tests (see HADOOP-9258 and related
> JIRAs.)
>
>
yeah, though I haven't written those tests yet. Plan is to pull most of the
HADOOP-8545 tests up, use Andrew Wang's wrapper code to make them work with
FileContext too, then add some class which every FS would implement; a
class that would provide a factory for filesystem/filecontext
implementations, and a Conf instance that declares FS capabilities:
has-umask, rmdir-root-test-safe-to-run, is-case-sensitive, max-path,
max-filename, ...). The (subclassed) test can use these values to skip
tests, and tune aspects.

I want to get the swift stuff in before the beta (its not going to have any
regressions, after all),  get feedback on that, and, once the code is
checked in, start on pulling up the tests.

-Steve

Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

You might try looking at what KosmoFS (KFS) did.  They have some code in
org/apache/hadoop/fs which calls their own Java shim.

This way, the shim code in hadoop-common gets updated whenever FileSystem
changes, but there is no requirement to install KFS before building Hadoop.

You might also try asking Steve Loughran, since he did some great work
recently to try to nail down the exact semantics of FileSystem and
FileContext and improve the related unit tests (see HADOOP-9258 and related
JIRAs.)

best,
Colin



On Thu, May 23, 2013 at 2:52 PM, Stephen Watt <sw...@redhat.com> wrote:

> Thanks for responding Harsh.
>
> I agree. Hadoop Common does do a good job of maintaining a stable and
> public FS and FS Context API. The pro for maintaining client libraries
> outside of Hadoop Common is that the release owner of the library has much
> more autonomy and agility in maintaining the library. From the glusterfs
> plugin perspective, I concur with this. In contrast, if my library was
> managed inside of Hadoop Common, I'd have to spend the time to earn
> committer status to have an equivalent amount of autonomy and agility,
> which is overkill for someone just wanting to maintain 400 lines of code.
>
> I ruminated a bit about one con which might be that because it doesn't get
> shipped with Hadoop Common it might make it harder for the Hadoop User
> community to find out about it and obtain it. However, if you consider the
> LZO codec, the fact that its not bundled certainly doesn't hamper its
> adoption.
>
> You mentioned testing. I don't think regression across Hadoop releases is
> as big of an issue as (based on my understanding) you really just have two
> FileSystem interfaces (abstract class) to worry about WRT to compliance,
> namely the FileSystem interface reflected for Hadoop 1.0 and the FileSystem
> interface reflected for Hadoop 2.0. However, this is a broader topic that I
> also want to discuss so I'll tee it up in a separate thread.
>
> Regards
> Steve Watt
>
>
> ----- Original Message -----
> From: "Harsh J" <ha...@cloudera.com>
> To: common-dev@hadoop.apache.org
> Sent: Thursday, May 23, 2013 1:37:30 PM
> Subject: Re: [DISCUSS] - Committing client code to 3rd Party FileSystems
> within Hadoop Common
>
> I think we do a fairly good work maintaining a stable and public FileSystem
> and FileContext API for third-party plugins to exist outside of Apache
> Hadoop but still be able to work well across versions.
>
> The question of test pops up though, specifically that of testing against
> trunk to catch regressions across various implementations, but it'd be much
> work for us to also maintain glusterfs dependencies and mechanisms as part
> of trunk.
>
> We do provide trunk build snapshot artifacts publicly for downstream
> projects to test against, which I think may help cover the continuous
> testing concerns, if there are those.
>
> Right now, I don't think the S3 FS we maintain really works all that well.
> I also recall, per recent conversations on the lists, that AMZN has started
> shipping their own library for a better implementation rather than
> perfecting the implementation we have here (correct me if am wrong but I
> think the changes were not all contributed back). I see some work going on
> for OpenStack's Swift, for which I think Steve also raised a similar
> discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't
> recall
> if the conversation proceeded at the time.
>
> What's your perspective as the releaser though? Would you not find
> maintaining this outside easier, especially in terms of maintaining your
> code for quicker releases, for both bug fixes and features - also given
> that you can CI it against Apache Hadoop trunk at the same time?
>
>
> On Thu, May 23, 2013 at 11:47 PM, Stephen Watt <sw...@redhat.com> wrote:
>
> > (Resending - I think the first time I sent this out it got lost within
> all
> > the ByLaws voting)
> >
> > Hi Folks
> >
> > My name is Steve Watt and I am presently working on enabling glusterfs to
> > be used as a Hadoop FileSystem. Most of the work thus far has involved
> > developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> > point where the plugin is becoming stable and I've been trying to
> > understand where the right place is to host/manage/version it.
> >
> > Steve Loughran was kind enough to point out a few past threads in the
> > community (such as
> >
> http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html
> )
> > that show a project disposition to move away from Hadoop Common
> containing
> > client code (plugins) for 3rd party FileSystems. This makes sense and
> > allows the filesystem plugin developer more autonomy as well as reduces
> > Hadoop Common's dependence on 3rd Party libraries.
> >
> > Before I embark down that path, can the PMC/Committers verify that the
> > preference is still to have client code for 3rd Party FileSystems hosted
> > and managed outside of Hadoop Common?
> >
> > Regards
> > Steve Watt
> >
>
>
>
> --
> Harsh J
>

Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

Posted by Stephen Watt <sw...@redhat.com>.

Thanks for responding Harsh. 

I agree. Hadoop Common does do a good job of maintaining a stable and public FS and FS Context API. The pro for maintaining client libraries outside of Hadoop Common is that the release owner of the library has much more autonomy and agility in maintaining the library. From the glusterfs plugin perspective, I concur with this. In contrast, if my library was managed inside of Hadoop Common, I'd have to spend the time to earn committer status to have an equivalent amount of autonomy and agility, which is overkill for someone just wanting to maintain 400 lines of code.

I ruminated a bit about one con which might be that because it doesn't get shipped with Hadoop Common it might make it harder for the Hadoop User community to find out about it and obtain it. However, if you consider the LZO codec, the fact that its not bundled certainly doesn't hamper its adoption.

You mentioned testing. I don't think regression across Hadoop releases is as big of an issue as (based on my understanding) you really just have two FileSystem interfaces (abstract class) to worry about WRT to compliance, namely the FileSystem interface reflected for Hadoop 1.0 and the FileSystem interface reflected for Hadoop 2.0. However, this is a broader topic that I also want to discuss so I'll tee it up in a separate thread.

Regards
Steve Watt

----- Original Message -----
From: "Harsh J" <ha...@cloudera.com>
To: common-dev@hadoop.apache.org
Sent: Thursday, May 23, 2013 1:37:30 PM
Subject: Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

I think we do a fairly good work maintaining a stable and public FileSystem
and FileContext API for third-party plugins to exist outside of Apache
Hadoop but still be able to work well across versions.

The question of test pops up though, specifically that of testing against
trunk to catch regressions across various implementations, but it'd be much
work for us to also maintain glusterfs dependencies and mechanisms as part
of trunk.

We do provide trunk build snapshot artifacts publicly for downstream
projects to test against, which I think may help cover the continuous
testing concerns, if there are those.

Right now, I don't think the S3 FS we maintain really works all that well.
I also recall, per recent conversations on the lists, that AMZN has started
shipping their own library for a better implementation rather than
perfecting the implementation we have here (correct me if am wrong but I
think the changes were not all contributed back). I see some work going on
for OpenStack's Swift, for which I think Steve also raised a similar
discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't recall
if the conversation proceeded at the time.

What's your perspective as the releaser though? Would you not find
maintaining this outside easier, especially in terms of maintaining your
code for quicker releases, for both bug fixes and features - also given
that you can CI it against Apache Hadoop trunk at the same time?

On Thu, May 23, 2013 at 11:47 PM, Stephen Watt <sw...@redhat.com> wrote:

> (Resending - I think the first time I sent this out it got lost within all
> the ByLaws voting)
>
> Hi Folks
>
> My name is Steve Watt and I am presently working on enabling glusterfs to
> be used as a Hadoop FileSystem. Most of the work thus far has involved
> developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> point where the plugin is becoming stable and I've been trying to
> understand where the right place is to host/manage/version it.
>
> Steve Loughran was kind enough to point out a few past threads in the
> community (such as
> http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html)
> that show a project disposition to move away from Hadoop Common containing
> client code (plugins) for 3rd party FileSystems. This makes sense and
> allows the filesystem plugin developer more autonomy as well as reduces
> Hadoop Common's dependence on 3rd Party libraries.
>
> Before I embark down that path, can the PMC/Committers verify that the
> preference is still to have client code for 3rd Party FileSystems hosted
> and managed outside of Hadoop Common?
>
> Regards
> Steve Watt
>

-- 
Harsh J

Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

Posted by Harsh J <ha...@cloudera.com>.

I think we do a fairly good work maintaining a stable and public FileSystem
and FileContext API for third-party plugins to exist outside of Apache
Hadoop but still be able to work well across versions.

The question of test pops up though, specifically that of testing against
trunk to catch regressions across various implementations, but it'd be much
work for us to also maintain glusterfs dependencies and mechanisms as part
of trunk.

We do provide trunk build snapshot artifacts publicly for downstream
projects to test against, which I think may help cover the continuous
testing concerns, if there are those.

Right now, I don't think the S3 FS we maintain really works all that well.
I also recall, per recent conversations on the lists, that AMZN has started
shipping their own library for a better implementation rather than
perfecting the implementation we have here (correct me if am wrong but I
think the changes were not all contributed back). I see some work going on
for OpenStack's Swift, for which I think Steve also raised a similar
discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't recall
if the conversation proceeded at the time.

What's your perspective as the releaser though? Would you not find
maintaining this outside easier, especially in terms of maintaining your
code for quicker releases, for both bug fixes and features - also given
that you can CI it against Apache Hadoop trunk at the same time?

On Thu, May 23, 2013 at 11:47 PM, Stephen Watt <sw...@redhat.com> wrote:

> (Resending - I think the first time I sent this out it got lost within all
> the ByLaws voting)
>
> Hi Folks
>
> My name is Steve Watt and I am presently working on enabling glusterfs to
> be used as a Hadoop FileSystem. Most of the work thus far has involved
> developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> point where the plugin is becoming stable and I've been trying to
> understand where the right place is to host/manage/version it.
>
> Steve Loughran was kind enough to point out a few past threads in the
> community (such as
> http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html)
> that show a project disposition to move away from Hadoop Common containing
> client code (plugins) for 3rd party FileSystems. This makes sense and
> allows the filesystem plugin developer more autonomy as well as reduces
> Hadoop Common's dependence on 3rd Party libraries.
>
> Before I embark down that path, can the PMC/Committers verify that the
> preference is still to have client code for 3rd Party FileSystems hosted
> and managed outside of Hadoop Common?
>
> Regards
> Steve Watt
>

-- 
Harsh J

[DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

Posted by Stephen Watt <sw...@redhat.com>.

(Resending - I think the first time I sent this out it got lost within all the ByLaws voting)

Hi Folks

My name is Steve Watt and I am presently working on enabling glusterfs to be used as a Hadoop FileSystem. Most of the work thus far has involved developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the point where the plugin is becoming stable and I've been trying to understand where the right place is to host/manage/version it. 

Steve Loughran was kind enough to point out a few past threads in the community (such as http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html) that show a project disposition to move away from Hadoop Common containing client code (plugins) for 3rd party FileSystems. This makes sense and allows the filesystem plugin developer more autonomy as well as reduces Hadoop Common's dependence on 3rd Party libraries. 

Before I embark down that path, can the PMC/Committers verify that the preference is still to have client code for 3rd Party FileSystems hosted and managed outside of Hadoop Common?

Regards
Steve Watt

Where should we host Hadoop FileSystem plugins for 3rd Party FileSystems?

Posted by Stephen Watt <sw...@redhat.com>.

Hi Folks

My name is Steve Watt and I am presently working on enabling glusterfs to be used as a Hadoop FileSystem. Most of the work thus far has involved developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the point where the plugin is becoming stable and I've been trying to understand where the right place is to host/manage/version it. 

Steve Loughran was kind enough to point out a few past threads in the community (http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html) that show a disposition to move away from Hadoop Common containing client code (plugins) for 3rd party FileSystems. This makes sense and allows the filesystem plugin developer more autonomy as well as reduces Hadoop Common's dependence on 3rd Party libraries. I'm easy either way. I just wanted to verify that the community's preference is still to have client code for 3rd Party FileSystems hosted and managed outside of Hadoop Common before I take that direction.

Regards
Steve Watt