You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2014/03/06 17:37:04 UTC

In-Memory Reference FS implementations

As part of HADOOP-9361, im visioning this.

1) - We create In Memory FS implementation of different Reference
FileSystems, each of which specifies appropriate tests , and passes those
tests , i.e.

   InMemStrictlyConsistentFS (i.e. hdfs)
   InMemEventuallyConsistentFS (blob stores)
   InMemMinmalFS (a very minimal gaurantee FS, for maybe

The beauty of this is - it gives us simple, easily testable reference
implementations that we can base our complex real world file system unit
tests off of.

2) Then, downstream vendors can just "pick" which of these file systems
they are most close to, and modify their particular file system to declare
semantics using the matching FS as a template.



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: In-Memory Reference FS implementations

Posted by Steve Loughran <st...@hortonworks.com>.

EMR's S3 does extra things, which is why netflix used injection tricks to
add theirs on top.

For blobstores, key use cases are

   1. -general source of low-rate-of-change artifacts
   2. -input for analysis jobs
   3. -output from them
   4. -chained operations
   5. storage of data to outlive the EMR cluster

#1 isn't a problem assuming the velocity of the artifacts is pretty low.

#2 -OK for data written "a while" earlier, provided there isn't an ongoing
partition.

#3 -speculation relies on atomic rename that fails if dest dir exists.
Blobstores don't have this and do rename as
   (i): check
  (iii) create root path
  (iii) copy of individual items below path
  (iv) delete of source.

The race between (i) and (ii) exists, and if the object store doesn't even
do create consistency (e.g AWS S3 US-East, but not the others [1]). This
means there's a risk of two committing reducers mixing outputs (risk low,
requires both processes to commit simultaneously)

#4 is trouble -anything waiting for one MR job to finish may start when it
finishes, but when job #2 kicks off and does an of the dir/path listing
methods, it may get an incomplete list of children -and hence, incomplete
list of output files.

That's the trouble. If people follow the best practise -HDFS for
intermediate work, S3 for final output, all is well. Netflix use S3 as the
output of all work, so they can schedule analytics on any Hadoop cluster
they have, and at the scale they run at they hit this problem. Other people
may have -just not noticed.

" I fear that a lot of applications are not ready for eventual
consistency, and may never be"

Exactly: i have code that uses HDFS to co-ordinate, and will never work on
an object store that doesn't have atomic/consistent ops

"leading to the feeling that Hadoop on S3 is buggy"

https://issues.apache.org/jira/browse/HADOOP-9577  -filed by someone @amazon

-Steve

HADOOP-9565 says "add a marker" :
https://issues.apache.org/jira/browse/HADOOP-9565

HADOOP-10373 goes further and says "move the s3 & s3n code into
hadoop-tools/hadoop-aws

https://issues.apache.org/jira/browse/HADOOP-10373

This will make it possible to swap in versions compiled against the same
Hadoop release, without having to build your own hadoop JARs

steve

(who learned too much about object stores  and the FileSystem class while
doing the swift:// coding)

[1] http://aws.amazon.com/s3/faqs/

On 6 March 2014 18:47, Colin McCabe <cm...@alumni.cmu.edu> wrote:

> NetFlix's Apache-licensed S3mper system provides consistency for an
> S3-backed store.
> http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
>
> It would be nice to see this or something like it integrated with
> Hadoop.  I fear that a lot of applications are not ready for eventual
> consistency, and may never be, leading to the feeling that Hadoop on
> S3 is buggy.
>
> Colin
>
> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas <ja...@gmail.com> wrote:
> > do you consider that native S3 FS  a real "reference implementation" for
> > blob stores? or just something that , by mere chance, we are able to use
> as
> > a ref. impl.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: In-Memory Reference FS implementations

Posted by Steve Loughran <st...@hortonworks.com>.

On 7 March 2014 01:35, Jay Vyas <ja...@gmail.com> wrote:

> Thanks steve.  So i guess the conclusion is
>
> 1) Wait on HADOOP-9361.
>

"help" with is a better plan. I really don't look at it that often. I'll
try and get it ready to review this weekend

> 2) There definitively cannot be a strict contract for a single HCFS, based
> on your examples shown.
>
> HDFS effectively defines the behaviour, though it is a defacto
specification. All the tests and docs we can do can formalise the
behaviour, and show other filesystems (including file://) where they are
inconsistent. Failure modes are intractable -it is everyone's right to fail
differently.

What worries me is that there are some unintentional behaviours
-epiphenomena- that we aren't aware of, but which downstream code depends
on.

mkdirs() being an atomic is an example -nobody made a decision to do that,
it just fell out of holding locks efficiently in the NN. Does anything
depend on int? Hopefully not -as if it does, that's something HDFS needs to
keep forever -so there'd better be a test for it.

Any others? I don't know -and that worries me. Filename, dir size and file
size assumptions are things that caused problems in the swift object store,
code also assumes that rmdir is O(1) and not O(n), so teardown of tests on
directories with many small files caused timeouts against throttled Swift
endpoints.

> In the meantime ill audit existing test coverage, and let me know if i can
> lend a hand in the cleanup process.
>

That'll be great

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: In-Memory Reference FS implementations

Posted by Jay Vyas <ja...@gmail.com>.

Thanks steve.  So i guess the conclusion is

1) Wait on HADOOP-9361.

2) There definitively cannot be a strict contract for a single HCFS, based
on your examples shown.

In the meantime ill audit existing test coverage, and let me know if i can
lend a hand in the cleanup process.




On Thu, Mar 6, 2014 at 4:01 PM, Steve Loughran <st...@hortonworks.com>wrote:

> Lets get the HADOOP-9361 stuff in (it lives alongside
> FileSystemContractBaseTest) and you can work off that.
>
>
> On 6 March 2014 18:57, Jay Vyas <ja...@gmail.com> wrote:
>
> > Thanks Colin: that's a good example of why we want To unify the hcfs test
> > profile.  So how can  hcfs implementations use current hadoop-common
> tests?
> >
> > In mind there are three ways.
> >
> > - one solution is to manually cobble together and copy tests , running
> > them one by one and seeing which ones apply to their fs.  this is what I
> > think we do now (extending base contract, main operations tests,
> overriding
> > some methods, ..).
> >
>
> Yes it is. Start there.
>
>
> >
> > - another solution is that all hadoop filesystems should conform to one
> > exact contract.  Is that a pipe dream? Or is it possible?
> >
>
>
> No as the nativeFS and hadoop FS
> -throw different exceptions
> -raise exceptions on seek past end of file at different times (HDFS: on
> seek, file:// on read)
> -have different illegal filenames (hdfs 2.3+ ".snapshot"). NTFS: "COM1" to
> COM9, unless you use the \\.\ unicode prefix
> -have different limits on dir size, depth, filename length
> -have different case sensitivity
>
> None of these are explicitly in the FileSystem and FileContract APIs, and
> nor can they be.
>
>
>
> >
> > - a third solution. Is that we could use a declarative API where file
> > system implementations declare which tests or groups of tests they don't
> > want to run.   That is basically hadoop-9361
> >
> >
> it does more,
>
> 1.it lets filesystems declare strict vs lax exceptions. Strict: detailed
> exceptions, like EOFException. Lax: IOException.
> 2. by declaring behaviours in an XML file in each filesystems -test.jar,
> downstream tests in, say, bigtop, can read in the same details
>
>
> > - The third approach could be complimented by barebones, simple in-memory
> > curated reference implementations that exemplify distilled filesystems
> with
> > certain salient properties (I.e. Non atomic mkdirs)
> >
> >
> > > On Mar 6, 2014, at 1:47 PM, Colin McCabe <cm...@alumni.cmu.edu>
> wrote:
> > >
> > > NetFlix's Apache-licensed S3mper system provides consistency for an
> > > S3-backed store.
> > > http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
> > >
> > > It would be nice to see this or something like it integrated with
> > > Hadoop.  I fear that a lot of applications are not ready for eventual
> > > consistency, and may never be, leading to the feeling that Hadoop on
> > > S3 is buggy.
> > >
> > > Colin
> > >
> > >> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas <ja...@gmail.com>
> wrote:
> > >> do you consider that native S3 FS  a real "reference implementation"
> for
> > >> blob stores? or just something that , by mere chance, we are able to
> > use as
> > >> a ref. impl.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: In-Memory Reference FS implementations

Posted by Steve Loughran <st...@hortonworks.com>.

Lets get the HADOOP-9361 stuff in (it lives alongside
FileSystemContractBaseTest) and you can work off that.


On 6 March 2014 18:57, Jay Vyas <ja...@gmail.com> wrote:

> Thanks Colin: that's a good example of why we want To unify the hcfs test
> profile.  So how can  hcfs implementations use current hadoop-common tests?
>
> In mind there are three ways.
>
> - one solution is to manually cobble together and copy tests , running
> them one by one and seeing which ones apply to their fs.  this is what I
> think we do now (extending base contract, main operations tests, overriding
> some methods, ..).
>

Yes it is. Start there.


>
> - another solution is that all hadoop filesystems should conform to one
> exact contract.  Is that a pipe dream? Or is it possible?
>


No as the nativeFS and hadoop FS
-throw different exceptions
-raise exceptions on seek past end of file at different times (HDFS: on
seek, file:// on read)
-have different illegal filenames (hdfs 2.3+ ".snapshot"). NTFS: "COM1" to
COM9, unless you use the \\.\ unicode prefix
-have different limits on dir size, depth, filename length
-have different case sensitivity

None of these are explicitly in the FileSystem and FileContract APIs, and
nor can they be.



>
> - a third solution. Is that we could use a declarative API where file
> system implementations declare which tests or groups of tests they don't
> want to run.   That is basically hadoop-9361
>
>
it does more,

1.it lets filesystems declare strict vs lax exceptions. Strict: detailed
exceptions, like EOFException. Lax: IOException.
2. by declaring behaviours in an XML file in each filesystems -test.jar,
downstream tests in, say, bigtop, can read in the same details


> - The third approach could be complimented by barebones, simple in-memory
> curated reference implementations that exemplify distilled filesystems with
> certain salient properties (I.e. Non atomic mkdirs)
>
>
> > On Mar 6, 2014, at 1:47 PM, Colin McCabe <cm...@alumni.cmu.edu> wrote:
> >
> > NetFlix's Apache-licensed S3mper system provides consistency for an
> > S3-backed store.
> > http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
> >
> > It would be nice to see this or something like it integrated with
> > Hadoop.  I fear that a lot of applications are not ready for eventual
> > consistency, and may never be, leading to the feeling that Hadoop on
> > S3 is buggy.
> >
> > Colin
> >
> >> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas <ja...@gmail.com> wrote:
> >> do you consider that native S3 FS  a real "reference implementation" for
> >> blob stores? or just something that , by mere chance, we are able to
> use as
> >> a ref. impl.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: In-Memory Reference FS implementations

Posted by Jay Vyas <ja...@gmail.com>.

Thanks Colin: that's a good example of why we want To unify the hcfs test profile.  So how can  hcfs implementations use current hadoop-common tests?

In mind there are three ways.

- one solution is to manually cobble together and copy tests , running them one by one and seeing which ones apply to their fs.  this is what I think we do now (extending base contract, main operations tests, overriding some methods, ..).

- another solution is that all hadoop filesystems should conform to one exact contract.  Is that a pipe dream? Or is it possible?

- a third solution. Is that we could use a declarative API where file system implementations declare which tests or groups of tests they don't want to run.   That is basically hadoop-9361

- The third approach could be complimented by barebones, simple in-memory curated reference implementations that exemplify distilled filesystems with certain salient properties (I.e. Non atomic mkdirs) 

 
> On Mar 6, 2014, at 1:47 PM, Colin McCabe <cm...@alumni.cmu.edu> wrote:
> 
> NetFlix's Apache-licensed S3mper system provides consistency for an
> S3-backed store.
> http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
> 
> It would be nice to see this or something like it integrated with
> Hadoop.  I fear that a lot of applications are not ready for eventual
> consistency, and may never be, leading to the feeling that Hadoop on
> S3 is buggy.
> 
> Colin
> 
>> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas <ja...@gmail.com> wrote:
>> do you consider that native S3 FS  a real "reference implementation" for
>> blob stores? or just something that , by mere chance, we are able to use as
>> a ref. impl.

Re: In-Memory Reference FS implementations

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

NetFlix's Apache-licensed S3mper system provides consistency for an
S3-backed store.
http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html

It would be nice to see this or something like it integrated with
Hadoop.  I fear that a lot of applications are not ready for eventual
consistency, and may never be, leading to the feeling that Hadoop on
S3 is buggy.

Colin

On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas <ja...@gmail.com> wrote:
> do you consider that native S3 FS  a real "reference implementation" for
> blob stores? or just something that , by mere chance, we are able to use as
> a ref. impl.

Re: In-Memory Reference FS implementations

Posted by Jay Vyas <ja...@gmail.com>.

do you consider that native S3 FS  a real "reference implementation" for
blob stores? or just something that , by mere chance, we are able to use as
a ref. impl.

Re: In-Memory Reference FS implementations

Posted by Steve Loughran <st...@hortonworks.com>.

On 6 March 2014 16:37, Jay Vyas <ja...@gmail.com> wrote:

> As part of HADOOP-9361, im visioning this.
>
> 1) - We create In Memory FS implementation of different Reference
> FileSystems, each of which specifies appropriate tests , and passes those
> tests , i.e.
>
>    InMemStrictlyConsistentFS (i.e. hdfs)
>

HDFS is the filesystem semantics expected by applications -indeed, it is
actually stricter than NFS in terms of its consistency model.

MiniHDFSCluster implements this today -and provides the RPC needed for
forked apps to access it.

For example, here's a test that uses YARN to bring up a forked process
bonded to HDFS mini cluster -a process that then starts HBase instances
talking to HDFS

https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/test/groovy/org/apache/hoya/yarn/cluster/live/TestHBaseMasterOnHDFS.groovy



   InMemEventuallyConsistentFS (blob stores)
>    InMemMinmalFS (a very minimal gaurantee FS, for maybe
>
> The beauty of this is - it gives us simple, easily testable reference
> implementations that we can base our complex real world file system unit
> tests off of.
>
>
I can see the merits of the Blobstore one, so as to demonstrate its
failings.

Thinking about it, we are mostly there already, because there's a mock impl
of the org.apache.hadoop.fs.s3native.NativeFileSystemStore interface used
behind the s3n:// class

hadoop-trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/InMemoryNativeFileSystemStore.java

We could enhance this to give it lower guarantees (AWS-US-east-no
guarantees, US-west: create-consistency), and allow a period of time before
new actions become visible, where actions are: create, delete, overwrite.

We could also allow its methods to take time and maybe fail, so emulating
the storeFile() operation, amongst others. Failure simulation would be
nice.




> 2) Then, downstream vendors can just "pick" which of these file systems
> they are most close to, and modify their particular file system to declare
> semantics using the matching FS as a template.
>
>
they get to implement an FS that works like HDFS. If the semantics << HDFS,
well, that's not a filesystem, irrespective of what methods it implements.


The blobstore marker interface is intended to cover that, to warn that
"this is not a real filesystem" -a marker applications can use to assert
that it isn't a "FileSystem" by the standard definition of one -and that
all guarantees are lost.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.