You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2014/03/05 20:07:57 UTC

FileSystem and FileContext Janitor, at your service !

Hi HCFS Community :)

This is Jay...  Some of you know me.... I hack on a broad range of file
system and hadoop ecosystem interoperability stuff.  I just wanted to
introduce myself and let you folks know im going to be working to help
clean up the existing unit testing frameworks for the FileSystem and
FileContext APIs.  I've listed some bullets below .

- byte code inspection based code coverage for file system APIs with a tool
such as corbertura.

- HADOOP-9361 points out that there are many different types of file
systems.

- Creating mock file systems which can be used to validate API tests, which
emulate different FS semantics (atomic directory creation, eventual
consistency, strict consistency, POSIX compliance, append support, etc...)

Is anyone interested in the above issues or have any opinions on how /
where i should get started?

Our end goal is to have a more transparent and portable set of test APIs
for the hadoop file system implementors, across the board : so that we can
all test our individual implementations confidently.

So, anywhere i can lend a hand - let me know.  I think this effort will
require all of us in the file system community to join forces, and it will
benefit us all immensly in the long run as well.

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: FileSystem and FileContext Janitor, at your service !

Posted by Jay Vyas <ja...@gmail.com>.
I think that is the purpose of the bigtop smoke tests, not the filesystem
smoke tests. right?


On Thu, Mar 6, 2014 at 12:51 PM, Steve Loughran <st...@hortonworks.com>wrote:

> I was thinking to test YARN-hosted apps like MapReduce, we need to see how
> they handle filesystems with different consistency/atomicity models, and
> YARN -even MiniYARNCluster -forks things off.
>
> If the MR commit logic is isolated, that could be tested in the JUnit JVM.
> But for other applications -example: Tez, its probably too complex to mock
>
>
>
>
> On 6 March 2014 16:17, Jay Vyas <ja...@gmail.com> wrote:
>
> > steve you mentioned:
> >
> > >> but to test YARN it has to be visible across processes.
> >
> > What do you mean by "test yarn"?   I think for the FileSystem APIs unit
> > testing, we dont care about YARN, do we?
> >
> >
> >
> >
> >
> > On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran <stevel@hortonworks.com
> > >wrote:
> >
> > > On 5 March 2014 19:07, Jay Vyas <ja...@gmail.com> wrote:
> > >
> > > > Hi HCFS Community :)
> > > >
> > > > This is Jay...  Some of you know me.... I hack on a broad range of
> file
> > > > system and hadoop ecosystem interoperability stuff.  I just wanted to
> > > > introduce myself and let you folks know im going to be working to
> help
> > > > clean up the existing unit testing frameworks for the FileSystem and
> > > > FileContext APIs.  I've listed some bullets below .
> > > >
> > > > - byte code inspection based code coverage for file system APIs with
> a
> > > tool
> > > > such as corbertura.
> > > >
> > > > - HADOOP-9361 points out that there are many different types of file
> > > > systems.
> > > >
> > > >
> > > It adds a lot more structure to the tests with an XML declaration of
> each
> > > FS (in the -test) JAR.
> > >
> > > It's pretty much complete except for some discrepancies between file://
> > and
> > > hdfs that I need to fix in file:
> > > -handling of mkdirs if the destination exists and is a file (currently:
> > > returns 0)
> > > -seek() on a closed stream. Currently appears to work,  at least on
> OS/X.
> > >
> > >
> > > > - Creating mock file systems which can be used to validate API tests,
> > > which
> > > > emulate different FS semantics (atomic directory creation, eventual
> > > > consistency, strict consistency, POSIX compliance, append support,
> > > etc...)
> > > >
> > >
> > > That's an interesting thought, adding some inconsistency semantics on
> top
> > > of an existing FS to emulate blobstore
> > > behaviour. How would you do this? A in-memory RAM FS could do some of
> > this,
> > > but to test YARN it has to be visible across processes.
> > > We'd really need an in-ram simulation of semantics that also offered an
> > RPC
> > > API of some form.
> > >
> > >
> > >
> > > >
> > > > Is anyone interested in the above issues or have any opinions on how
> /
> > > > where i should get started?
> > > >
> > > > Our end goal is to have a more transparent and portable set of test
> > APIs
> > > > for the hadoop file system implementors, across the board : so that
> we
> > > can
> > > > all test our individual implementations confidently.
> > > >
> > > > So, anywhere i can lend a hand - let me know.  I think this effort
> will
> > > > require all of us in the file system community to join forces, and it
> > > will
> > > > benefit us all immensly in the long run as well.
> > > >
> > > >
> > > I should do another '9361 patch, once I get those final quirks in
> file://
> > > sorted out so that it is consistent with HDFS.
> > > 1. HDFS is and continues to be, the definition of the semantics of all
> > > filesystem interfaces.
> > > 2. It'd be good if we understood more about what accidental features of
> > the
> > > FS code depends on. e.g. does anything rely on mkdirs() being atomic?
> Of
> > > 0x00 being a valid char in a filename? How do programs fail when
> > blocksize
> > > is too small (try setting it to 1 and see how pig reacts)? How much
> code
> > > depends on close() being near-instantaneous and never failing?
> Blobstores
> > > do their write then, and can break both these requirements -which is
> > > something a mock FS could add atop file:
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
> >
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: FileSystem and FileContext Janitor, at your service !

Posted by Steve Loughran <st...@hortonworks.com>.
I was thinking to test YARN-hosted apps like MapReduce, we need to see how
they handle filesystems with different consistency/atomicity models, and
YARN -even MiniYARNCluster -forks things off.

If the MR commit logic is isolated, that could be tested in the JUnit JVM.
But for other applications -example: Tez, its probably too complex to mock




On 6 March 2014 16:17, Jay Vyas <ja...@gmail.com> wrote:

> steve you mentioned:
>
> >> but to test YARN it has to be visible across processes.
>
> What do you mean by "test yarn"?   I think for the FileSystem APIs unit
> testing, we dont care about YARN, do we?
>
>
>
>
>
> On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran <stevel@hortonworks.com
> >wrote:
>
> > On 5 March 2014 19:07, Jay Vyas <ja...@gmail.com> wrote:
> >
> > > Hi HCFS Community :)
> > >
> > > This is Jay...  Some of you know me.... I hack on a broad range of file
> > > system and hadoop ecosystem interoperability stuff.  I just wanted to
> > > introduce myself and let you folks know im going to be working to help
> > > clean up the existing unit testing frameworks for the FileSystem and
> > > FileContext APIs.  I've listed some bullets below .
> > >
> > > - byte code inspection based code coverage for file system APIs with a
> > tool
> > > such as corbertura.
> > >
> > > - HADOOP-9361 points out that there are many different types of file
> > > systems.
> > >
> > >
> > It adds a lot more structure to the tests with an XML declaration of each
> > FS (in the -test) JAR.
> >
> > It's pretty much complete except for some discrepancies between file://
> and
> > hdfs that I need to fix in file:
> > -handling of mkdirs if the destination exists and is a file (currently:
> > returns 0)
> > -seek() on a closed stream. Currently appears to work,  at least on OS/X.
> >
> >
> > > - Creating mock file systems which can be used to validate API tests,
> > which
> > > emulate different FS semantics (atomic directory creation, eventual
> > > consistency, strict consistency, POSIX compliance, append support,
> > etc...)
> > >
> >
> > That's an interesting thought, adding some inconsistency semantics on top
> > of an existing FS to emulate blobstore
> > behaviour. How would you do this? A in-memory RAM FS could do some of
> this,
> > but to test YARN it has to be visible across processes.
> > We'd really need an in-ram simulation of semantics that also offered an
> RPC
> > API of some form.
> >
> >
> >
> > >
> > > Is anyone interested in the above issues or have any opinions on how /
> > > where i should get started?
> > >
> > > Our end goal is to have a more transparent and portable set of test
> APIs
> > > for the hadoop file system implementors, across the board : so that we
> > can
> > > all test our individual implementations confidently.
> > >
> > > So, anywhere i can lend a hand - let me know.  I think this effort will
> > > require all of us in the file system community to join forces, and it
> > will
> > > benefit us all immensly in the long run as well.
> > >
> > >
> > I should do another '9361 patch, once I get those final quirks in file://
> > sorted out so that it is consistent with HDFS.
> > 1. HDFS is and continues to be, the definition of the semantics of all
> > filesystem interfaces.
> > 2. It'd be good if we understood more about what accidental features of
> the
> > FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
> > 0x00 being a valid char in a filename? How do programs fail when
> blocksize
> > is too small (try setting it to 1 and see how pig reacts)? How much code
> > depends on close() being near-instantaneous and never failing? Blobstores
> > do their write then, and can break both these requirements -which is
> > something a mock FS could add atop file:
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: FileSystem and FileContext Janitor, at your service !

Posted by Jay Vyas <ja...@gmail.com>.
steve you mentioned:

>> but to test YARN it has to be visible across processes.

What do you mean by "test yarn"?   I think for the FileSystem APIs unit
testing, we dont care about YARN, do we?





On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran <st...@hortonworks.com>wrote:

> On 5 March 2014 19:07, Jay Vyas <ja...@gmail.com> wrote:
>
> > Hi HCFS Community :)
> >
> > This is Jay...  Some of you know me.... I hack on a broad range of file
> > system and hadoop ecosystem interoperability stuff.  I just wanted to
> > introduce myself and let you folks know im going to be working to help
> > clean up the existing unit testing frameworks for the FileSystem and
> > FileContext APIs.  I've listed some bullets below .
> >
> > - byte code inspection based code coverage for file system APIs with a
> tool
> > such as corbertura.
> >
> > - HADOOP-9361 points out that there are many different types of file
> > systems.
> >
> >
> It adds a lot more structure to the tests with an XML declaration of each
> FS (in the -test) JAR.
>
> It's pretty much complete except for some discrepancies between file:// and
> hdfs that I need to fix in file:
> -handling of mkdirs if the destination exists and is a file (currently:
> returns 0)
> -seek() on a closed stream. Currently appears to work,  at least on OS/X.
>
>
> > - Creating mock file systems which can be used to validate API tests,
> which
> > emulate different FS semantics (atomic directory creation, eventual
> > consistency, strict consistency, POSIX compliance, append support,
> etc...)
> >
>
> That's an interesting thought, adding some inconsistency semantics on top
> of an existing FS to emulate blobstore
> behaviour. How would you do this? A in-memory RAM FS could do some of this,
> but to test YARN it has to be visible across processes.
> We'd really need an in-ram simulation of semantics that also offered an RPC
> API of some form.
>
>
>
> >
> > Is anyone interested in the above issues or have any opinions on how /
> > where i should get started?
> >
> > Our end goal is to have a more transparent and portable set of test APIs
> > for the hadoop file system implementors, across the board : so that we
> can
> > all test our individual implementations confidently.
> >
> > So, anywhere i can lend a hand - let me know.  I think this effort will
> > require all of us in the file system community to join forces, and it
> will
> > benefit us all immensly in the long run as well.
> >
> >
> I should do another '9361 patch, once I get those final quirks in file://
> sorted out so that it is consistent with HDFS.
> 1. HDFS is and continues to be, the definition of the semantics of all
> filesystem interfaces.
> 2. It'd be good if we understood more about what accidental features of the
> FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
> 0x00 being a valid char in a filename? How do programs fail when blocksize
> is too small (try setting it to 1 and see how pig reacts)? How much code
> depends on close() being near-instantaneous and never failing? Blobstores
> do their write then, and can break both these requirements -which is
> something a mock FS could add atop file:
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: FileSystem and FileContext Janitor, at your service !

Posted by Steve Loughran <st...@hortonworks.com>.
On 5 March 2014 19:07, Jay Vyas <ja...@gmail.com> wrote:

> Hi HCFS Community :)
>
> This is Jay...  Some of you know me.... I hack on a broad range of file
> system and hadoop ecosystem interoperability stuff.  I just wanted to
> introduce myself and let you folks know im going to be working to help
> clean up the existing unit testing frameworks for the FileSystem and
> FileContext APIs.  I've listed some bullets below .
>
> - byte code inspection based code coverage for file system APIs with a tool
> such as corbertura.
>
> - HADOOP-9361 points out that there are many different types of file
> systems.
>
>
It adds a lot more structure to the tests with an XML declaration of each
FS (in the -test) JAR.

It's pretty much complete except for some discrepancies between file:// and
hdfs that I need to fix in file:
-handling of mkdirs if the destination exists and is a file (currently:
returns 0)
-seek() on a closed stream. Currently appears to work,  at least on OS/X.


> - Creating mock file systems which can be used to validate API tests, which
> emulate different FS semantics (atomic directory creation, eventual
> consistency, strict consistency, POSIX compliance, append support, etc...)
>

That's an interesting thought, adding some inconsistency semantics on top
of an existing FS to emulate blobstore
behaviour. How would you do this? A in-memory RAM FS could do some of this,
but to test YARN it has to be visible across processes.
We'd really need an in-ram simulation of semantics that also offered an RPC
API of some form.



>
> Is anyone interested in the above issues or have any opinions on how /
> where i should get started?
>
> Our end goal is to have a more transparent and portable set of test APIs
> for the hadoop file system implementors, across the board : so that we can
> all test our individual implementations confidently.
>
> So, anywhere i can lend a hand - let me know.  I think this effort will
> require all of us in the file system community to join forces, and it will
> benefit us all immensly in the long run as well.
>
>
I should do another '9361 patch, once I get those final quirks in file://
sorted out so that it is consistent with HDFS.
1. HDFS is and continues to be, the definition of the semantics of all
filesystem interfaces.
2. It'd be good if we understood more about what accidental features of the
FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
0x00 being a valid char in a filename? How do programs fail when blocksize
is too small (try setting it to 1 and see how pig reacts)? How much code
depends on close() being near-instantaneous and never failing? Blobstores
do their write then, and can break both these requirements -which is
something a mock FS could add atop file:

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.