You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Jeff Dost <jd...@ucsd.edu> on 2013/08/07 19:59:48 UTC

Feature request to provide DFSInputStream subclassing mechanism

Hello,

We work in a software development team at the UCSD CMS Tier2 Center.  We 
would like to propose a mechanism to allow one to subclass the 
DFSInputStream in a clean way from an external package.  First I'd like 
to give some motivation on why and then will proceed with the details.

We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment 
at CERN.  There are other T2 centers worldwide that contain mirrors of 
the same data we host.  We are working on an extension to Hadoop that, 
on reading a file, if it is found that there are no available replicas 
of a block, we use an external interface to retrieve this block of the 
file from another data center.  The external interface is necessary 
because not all T2 centers involved in CMS are running a Hadoop cluster 
as their storage backend.

In order to implement this functionality, we need to subclass the 
DFSInputStream and override the read method, so we can catch 
IOExceptions that occur on client reads at the block level.

The basic steps required:
1. Invent a new URI scheme for the customized "FileSystem" in core-site.xml:
   <property>
     <name>fs.foofs.impl</name>
     <value>my.package.FooFileSystem</value>
     <description>My Extended FileSystem for foofs: uris.</description>
   </property>

2. Write new classes included in the external package that subclass the 
following:
FooFileSystem subclasses DistributedFileSystem
FooFSClient subclasses DFSClient
FooFSInputStream subclasses DFSInputStream

Now any client commands that explicitly use the foofs:// scheme in paths 
to access the hadoop cluster can open files with a customized 
InputStream that extends functionality of the default hadoop client 
DFSInputStream.  In order to make this happen for our use case, we had 
to change some access modifiers in the DistributedFileSystem, DFSClient, 
and DFSInputStream classes provided by Hadoop.  In addition, we had to 
comment out the check in the namenode code that only allows for URI 
schemes of the form "hdfs://".

Attached is a patch file we apply to hadoop.  Note that we derived this 
patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be 
found at:
http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz

We would greatly appreciate any advise on whether or not this approach 
sounds reasonable, and if you would consider accepting these 
modifications into the official Hadoop code base.

Thank you,
Jeff, Alja & Matevz
UCSD Physics

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Jeff Dost <jd...@ucsd.edu>.

Hi Steve,

Thanks again for your in depth replies, we found your comments quite 
useful.  A few responses inline:

On 8/9/13 10:31 AM, Steve Loughran wrote:
> On 8 August 2013 21:51, Matevz Tadel <mt...@ucsd.edu> wrote:
>
>> We already do fallback to xrootd on open failures from our application
>> framework ... the job gets redirected to xrootd proxy which downloads the
>> whole file and serves data as the job asks for it. The changes described by
>> Jeff are an optimization for cases when a single block becomes unavailable.
>>
>>
> Like other said, there's work on heterogenous storage. Maybe you could make
> sure that there is some handling there for block unavailablity events -then
> you can hook in that to handle it.
We discussed this a bit more and agree, the best way forward is to 
carefully watch the progress of this development, and we will look into 
how it will fit our needs.
>> Thanks also for the comments below, Jeff will appreciate them more as he
>> actually went through all the HDFS code and did all the changes in the
>> patch, I did the JNI + Xrootd part.
>>
>> One more question ... if we'd converge on an acceptable solution (I find
>> it a bit hard at the moment), how long would it take for the changes to be
>> released and what release cycle would it end up in?
>>
> all changes go into trunk, if it was go to into the 2.x line then it would
> be 2.3 at the earliest; 2.1 is going through its beta right now. If you
> work against 2.1 and report bugs now, that would help the beta and make it
> easier for you to have a private fork of 2.1.x with the extensions
>
In the long term this sounds like a reasonable approach.  For the short 
term, we will continue to work with OSG to include patches needed to 
support our use case, as their upcoming release will be based the 2.0 
branch we have already been experimenting with.  We will take into 
careful consideration the requirements you outlined that would make our 
patches reasonable candidates to be considered in the official Hadoop trunk.

Jeff Dost
UCSD Physics

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Steve Loughran <st...@hortonworks.com>.

On 8 August 2013 21:51, Matevz Tadel <mt...@ucsd.edu> wrote:

> Hi Steve,
>
> Thank you very much for the reality check! Some more answers inline ...
>
>
> On 8/8/13 1:30 PM, Steve Loughran wrote:
>
>> On 7 August 2013 10:59, Jeff Dost <jd...@ucsd.edu> wrote:
>>
>>  Hello,
>>>
>>> We work in a software development team at the UCSD CMS Tier2 Center.  We
>>> would like to propose a mechanism to allow one to subclass the
>>> DFSInputStream in a clean way from an external package.  First I'd like
>>> to
>>> give some motivation on why and then will proceed with the details.
>>>
>>> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment at
>>> CERN.  There are other T2 centers worldwide that contain mirrors of the
>>> same data we host.  We are working on an extension to Hadoop that, on
>>> reading a file, if it is found that there are no available replicas of a
>>> block, we use an external interface to retrieve this block of the file
>>> from
>>> another data center.  The external interface is necessary because not all
>>> T2 centers involved in CMS are running a Hadoop cluster as their storage
>>> backend.
>>>
>>>
>> You are relying on all these T2 site being HDFS clusters with exactly the
>> same block sizing of a named file, so that you can pick up a copy of a
>> block from elsewhere?
>>
>
> No, it's not just HDFS. But we have a common access method for all sites,
> XRootd (http://xrootd.slac.stanford.**edu/<http://xrootd.slac.stanford.edu/>),
> and that's what our implementation of input stream falls back to using.
>

OK: so any block access issue triggers fallback.


>
> Also, HDFS block sizes are not the same ... it's in the hands of T2 admins.
>
>
>  Why not just handle a failing FileNotFoundException on an open() and
>> either
>> relay to another site or escalate to T1/T0/Castor? This would be far
>> easier
>> to implement with a front end wrapper for any FileSystem. What you are
>> proposing seems far more brittle in both code and execution.
>>
>
> We already do fallback to xrootd on open failures from our application
> framework ... the job gets redirected to xrootd proxy which downloads the
> whole file and serves data as the job asks for it. The changes described by
> Jeff are an optimization for cases when a single block becomes unavailable.
>
>
Like other said, there's work on heterogenous storage. Maybe you could make
sure that there is some handling there for block unavailablity events -then
you can hook in that to handle it.


> We have a lot of data that is replicated on several sites and also
> available on tape at Fermilab. Popularity of datasets (a couple 100 TB)
> varies quite a lot and what we would like to achieve is to be able to
> reduce replication down to one for files that not many people care for at
> the moment. This should give us about 1 PB of diskspace on every T2 center
> and allow us to be more proactive with data movement by simply observing
> the job queues.
>
>
>  In order to implement this functionality, we need to subclass the
>>> DFSInputStream and override the read method, so we can catch IOExceptions
>>> that occur on client reads at the block level.
>>>
>>>
>> Specific IOEs related to missing blocks, or any IOE -as that could swallow
>> other problems?
>>
>
> We were looking at that but couldn't quite figure out what else can go
> wrong ... so the plan was to log all the exceptions and see what we can do
> about each of them.
>

network problems, security. If there isn't a specific IOE subclass for
block-not-found, there ought to be.


>
> Thanks also for the comments below, Jeff will appreciate them more as he
> actually went through all the HDFS code and did all the changes in the
> patch, I did the JNI + Xrootd part.
>
> One more question ... if we'd converge on an acceptable solution (I find
> it a bit hard at the moment), how long would it take for the changes to be
> released and what release cycle would it end up in?
>

all changes go into trunk, if it was go to into the 2.x line then it would
be 2.3 at the earliest; 2.1 is going through its beta right now. If you
work against 2.1 and report bugs now, that would help the beta and make it
easier for you to have a private fork of 2.1.x with the extensions

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Matevz Tadel <mt...@ucsd.edu>.

Hi Steve,

Thank you very much for the reality check! Some more answers inline ...

On 8/8/13 1:30 PM, Steve Loughran wrote:
> On 7 August 2013 10:59, Jeff Dost <jd...@ucsd.edu> wrote:
>
>> Hello,
>>
>> We work in a software development team at the UCSD CMS Tier2 Center.  We
>> would like to propose a mechanism to allow one to subclass the
>> DFSInputStream in a clean way from an external package.  First I'd like to
>> give some motivation on why and then will proceed with the details.
>>
>> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment at
>> CERN.  There are other T2 centers worldwide that contain mirrors of the
>> same data we host.  We are working on an extension to Hadoop that, on
>> reading a file, if it is found that there are no available replicas of a
>> block, we use an external interface to retrieve this block of the file from
>> another data center.  The external interface is necessary because not all
>> T2 centers involved in CMS are running a Hadoop cluster as their storage
>> backend.
>>
>
> You are relying on all these T2 site being HDFS clusters with exactly the
> same block sizing of a named file, so that you can pick up a copy of a
> block from elsewhere?

No, it's not just HDFS. But we have a common access method for all sites, XRootd 
(http://xrootd.slac.stanford.edu/), and that's what our implementation of input 
stream falls back to using.

Also, HDFS block sizes are not the same ... it's in the hands of T2 admins.

> Why not just handle a failing FileNotFoundException on an open() and either
> relay to another site or escalate to T1/T0/Castor? This would be far easier
> to implement with a front end wrapper for any FileSystem. What you are
> proposing seems far more brittle in both code and execution.

We already do fallback to xrootd on open failures from our application framework 
... the job gets redirected to xrootd proxy which downloads the whole file and 
serves data as the job asks for it. The changes described by Jeff are an 
optimization for cases when a single block becomes unavailable.

We have a lot of data that is replicated on several sites and also available on 
tape at Fermilab. Popularity of datasets (a couple 100 TB) varies quite a lot 
and what we would like to achieve is to be able to reduce replication down to 
one for files that not many people care for at the moment. This should give us 
about 1 PB of diskspace on every T2 center and allow us to be more proactive 
with data movement by simply observing the job queues.

>> In order to implement this functionality, we need to subclass the
>> DFSInputStream and override the read method, so we can catch IOExceptions
>> that occur on client reads at the block level.
>>
>
> Specific IOEs related to missing blocks, or any IOE -as that could swallow
> other problems?

We were looking at that but couldn't quite figure out what else can go wrong ... 
so the plan was to log all the exceptions and see what we can do about each of them.

Thanks also for the comments below, Jeff will appreciate them more as he 
actually went through all the HDFS code and did all the changes in the patch, I 
did the JNI + Xrootd part.

One more question ... if we'd converge on an acceptable solution (I find it a 
bit hard at the moment), how long would it take for the changes to be released 
and what release cycle would it end up in?

Best,
Matevz


>> The basic steps required:
>> 1. Invent a new URI scheme for the customized "FileSystem" in
>> core-site.xml:
>>    <property>
>>      <name>fs.foofs.impl</name>
>>      <value>my.package.**FooFileSystem</value>
>>      <description>My Extended FileSystem for foofs: uris.</description>
>>    </property>
>>
>
> Assuming you only used the FS driver everywhere, then you could just
> overwrite the hdfs declaration
>
>
>>
>> 2. Write new classes included in the external package that subclass the
>> following:
>> FooFileSystem subclasses DistributedFileSystem
>> FooFSClient subclasses DFSClient
>> FooFSInputStream subclasses DFSInputStream
>>
>> Now any client commands that explicitly use the foofs:// scheme in paths
>> to access the hadoop cluster can open files with a customized InputStream
>> that extends functionality of the default hadoop client DFSInputStream.  In
>> order to make this happen for our use case, we had to change some access
>> modifiers in the DistributedFileSystem, DFSClient, and DFSInputStream
>> classes provided by Hadoop.
>
>
> As well as marking such things as unstable/internal, its critical that such
> things don't expose any sync problems (which bool closed may). Better to
> use protected accessors.
>
>
>> In addition, we had to comment out the check in the namenode code that
>> only allows for URI schemes of the form "hdfs://".
>>
>
> That's there for a reason and filtering it would break. The more elegant
> design would be for DistributedFileSystem to have a non-final
> getSchemaName() method used in the validator; you'd just overwride it
>
>
>>
>> Attached is a patch file we apply to hadoop.
>
>
> Patches and reviews should be done via JIRA
>
>
>
>> Note that we derived this patch by modding the Cloudera release
>> hadoop-2.0.0-cdh4.1.1 which can be found at:
>> http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>>
>>
> And they have to be done against hadoop trunk in SVN, then potentially
> backported to earlier ASF releases. Vendors are free to cherry pick
> anything.
>
>
>> We would greatly appreciate any advise on whether or not this approach
>> sounds reasonable, and if you would consider accepting these modifications
>> into the official Hadoop code base.
>>
>
> The changes look somewhat minor, but there are issues above -everyone is
> worried about stability of HDFS and anything that would change behaviour
> (which the commenting out check is very much so) would be rejected.
>
> For a patch to be considered
>
>     1. it must not change the semantics of existing APIs, here DFS client
>     code
>     2. it mustn't expose fields that could generate concurrency problems
>     3. It has to be considered the right approach
>     4. It has to come with tests
>
> These are things that could be covered in the JIRA
>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Steve Loughran <st...@hortonworks.com>.

On 7 August 2013 10:59, Jeff Dost <jd...@ucsd.edu> wrote:

> Hello,
>
> We work in a software development team at the UCSD CMS Tier2 Center.  We
> would like to propose a mechanism to allow one to subclass the
> DFSInputStream in a clean way from an external package.  First I'd like to
> give some motivation on why and then will proceed with the details.
>
> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment at
> CERN.  There are other T2 centers worldwide that contain mirrors of the
> same data we host.  We are working on an extension to Hadoop that, on
> reading a file, if it is found that there are no available replicas of a
> block, we use an external interface to retrieve this block of the file from
> another data center.  The external interface is necessary because not all
> T2 centers involved in CMS are running a Hadoop cluster as their storage
> backend.
>

You are relying on all these T2 site being HDFS clusters with exactly the
same block sizing of a named file, so that you can pick up a copy of a
block from elsewhere?

Why not just handle a failing FileNotFoundException on an open() and either
relay to another site or escalate to T1/T0/Castor? This would be far easier
to implement with a front end wrapper for any FileSystem. What you are
proposing seems far more brittle in both code and execution.

>
> In order to implement this functionality, we need to subclass the
> DFSInputStream and override the read method, so we can catch IOExceptions
> that occur on client reads at the block level.
>

Specific IOEs related to missing blocks, or any IOE -as that could swallow
other problems?

>
> The basic steps required:
> 1. Invent a new URI scheme for the customized "FileSystem" in
> core-site.xml:
>   <property>
>     <name>fs.foofs.impl</name>
>     <value>my.package.**FooFileSystem</value>
>     <description>My Extended FileSystem for foofs: uris.</description>
>   </property>
>

Assuming you only used the FS driver everywhere, then you could just
overwrite the hdfs declaration

>
> 2. Write new classes included in the external package that subclass the
> following:
> FooFileSystem subclasses DistributedFileSystem
> FooFSClient subclasses DFSClient
> FooFSInputStream subclasses DFSInputStream
>
> Now any client commands that explicitly use the foofs:// scheme in paths
> to access the hadoop cluster can open files with a customized InputStream
> that extends functionality of the default hadoop client DFSInputStream.  In
> order to make this happen for our use case, we had to change some access
> modifiers in the DistributedFileSystem, DFSClient, and DFSInputStream
> classes provided by Hadoop.

As well as marking such things as unstable/internal, its critical that such
things don't expose any sync problems (which bool closed may). Better to
use protected accessors.

> In addition, we had to comment out the check in the namenode code that
> only allows for URI schemes of the form "hdfs://".
>

That's there for a reason and filtering it would break. The more elegant
design would be for DistributedFileSystem to have a non-final
getSchemaName() method used in the validator; you'd just overwride it

>
> Attached is a patch file we apply to hadoop.

Patches and reviews should be done via JIRA

> Note that we derived this patch by modding the Cloudera release
> hadoop-2.0.0-cdh4.1.1 which can be found at:
> http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>
>
And they have to be done against hadoop trunk in SVN, then potentially
backported to earlier ASF releases. Vendors are free to cherry pick
anything.

> We would greatly appreciate any advise on whether or not this approach
> sounds reasonable, and if you would consider accepting these modifications
> into the official Hadoop code base.
>

The changes look somewhat minor, but there are issues above -everyone is
worried about stability of HDFS and anything that would change behaviour
(which the commenting out check is very much so) would be rejected.

For a patch to be considered

   1. it must not change the semantics of existing APIs, here DFS client
   code
   2. it mustn't expose fields that could generate concurrency problems
   3. It has to be considered the right approach
   4. It has to come with tests

These are things that could be covered in the JIRA

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Jeff,

Do you need to subclass or could you simply wrap? Generally composition as
opposed to inheritance is a lot safer way of integrating software written
by different parties, since inheritance exposes all the implementation
details which are subject to change.

-Todd

On Wed, Aug 7, 2013 at 10:59 AM, Jeff Dost <jd...@ucsd.edu> wrote:

> Hello,
>
> We work in a software development team at the UCSD CMS Tier2 Center.  We
> would like to propose a mechanism to allow one to subclass the
> DFSInputStream in a clean way from an external package.  First I'd like to
> give some motivation on why and then will proceed with the details.
>
> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment at
> CERN.  There are other T2 centers worldwide that contain mirrors of the
> same data we host.  We are working on an extension to Hadoop that, on
> reading a file, if it is found that there are no available replicas of a
> block, we use an external interface to retrieve this block of the file from
> another data center.  The external interface is necessary because not all
> T2 centers involved in CMS are running a Hadoop cluster as their storage
> backend.
>
> In order to implement this functionality, we need to subclass the
> DFSInputStream and override the read method, so we can catch IOExceptions
> that occur on client reads at the block level.
>
> The basic steps required:
> 1. Invent a new URI scheme for the customized "FileSystem" in
> core-site.xml:
>   <property>
>     <name>fs.foofs.impl</name>
>     <value>my.package.**FooFileSystem</value>
>     <description>My Extended FileSystem for foofs: uris.</description>
>   </property>
>
> 2. Write new classes included in the external package that subclass the
> following:
> FooFileSystem subclasses DistributedFileSystem
> FooFSClient subclasses DFSClient
> FooFSInputStream subclasses DFSInputStream
>
> Now any client commands that explicitly use the foofs:// scheme in paths
> to access the hadoop cluster can open files with a customized InputStream
> that extends functionality of the default hadoop client DFSInputStream.  In
> order to make this happen for our use case, we had to change some access
> modifiers in the DistributedFileSystem, DFSClient, and DFSInputStream
> classes provided by Hadoop.  In addition, we had to comment out the check
> in the namenode code that only allows for URI schemes of the form "hdfs://".
>
> Attached is a patch file we apply to hadoop.  Note that we derived this
> patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
> found at:
> http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>
> We would greatly appreciate any advise on whether or not this approach
> sounds reasonable, and if you would consider accepting these modifications
> into the official Hadoop code base.
>
> Thank you,
> Jeff, Alja & Matevz
> UCSD Physics
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Suresh Srinivas <su...@hortonworks.com>.

This is being targeted for release 2.3.

2.1.x release stream is for stabilizing. When it reaches stability, 2.2 GA
will be released. The current features in development will make it to 2.3,
including HDFS-2832.


On Thu, Aug 8, 2013 at 2:04 PM, Matevz Tadel <mt...@ucsd.edu> wrote:

> Thanks Colin, I subscribed to HDFS-2832 so that I can follow the
> development there. I assume this is targeting release 2.1.
>
> Best,
> Matevz
>
>
>
> On 08/08/13 12:10, Colin McCabe wrote:
>
>> There is work underway to decouple the block layer and the namespace
>> layer of HDFS from each other.  Once this is done, block behaviors
>> like the one you describe will be easy to implement.  It's a use case
>> very similar to the hierarchical storage management (HSM) use case
>> that we've discussed before.  Check out HDFS-2832.  Hopefully there
>> will be a design document posted soon.
>>
>> cheers,
>> Colin
>>
>>
>> On Thu, Aug 8, 2013 at 11:52 AM, Matevz Tadel <mt...@ucsd.edu> wrote:
>>
>>> Hi everybody,
>>>
>>> I'm jumping in as Jeff is away due to an unexpected annoyance involving
>>> Californian wildlife.
>>>
>>>
>>> On 8/7/13 7:47 PM, Andrew Wang wrote:
>>>
>>>>
>>>> Blocks are supposed to be an internal abstraction within HDFS, and
>>>> aren't
>>>> an
>>>> inherent part of FileSystem (the user-visible class used to access all
>>>> Hadoop
>>>> filesystems).
>>>>
>>>
>>>
>>> Yes, but it's a really useful abstraction :) Do you really believe the
>>> blocks could be abandoned in the next couple of years? I mean, it's such
>>> a
>>> simple and effective solution ...
>>>
>>>
>>>  Is it possible to instead deal with files and offsets? On a read
>>>> failure,
>>>> you
>>>> could open a stream to the same file on the backup filesystem, seek to
>>>> the
>>>> old
>>>> file position, and retry the read. This feels like it's possible via
>>>> wrapping.
>>>>
>>>
>>>
>>> As Jeff briefly mentioned, all USCMS sites export their data via XRootd
>>> (not
>>> all of them use HDFS!) and we developed a specialization of XRootd
>>> caching
>>> proxy that can fetch only requested blocks (block size is passed from our
>>> input stream class to XRootd client (via JNI) and on to the proxy server)
>>> and keep them in a local cache. This allows as to do three things:
>>>
>>> 1. the first time we notice a block is missing, a whole block is fetched
>>> from elsewhere and further access requests from the same process get
>>> fulfilled with zero latency;
>>>
>>> 2. later requests from other processes asking for this block are
>>> fulfilled
>>> immediately (well, after the initial 3 retries);
>>>
>>> 3. we have a list of blocks that were fetched and we could (this is what
>>> we
>>> want to look into in the near future) re-inject them into HDFS if the
>>> data
>>> loss turns out to be permanent (bad disk vs. node that was
>>> offline/overloaded for a while).
>>>
>>> Handling exceptions at the block level thus gives us just what we need.
>>> As
>>> input stream is the place where these errors become known it is, I think,
>>> also the easiest place to handle them.
>>>
>>> I'll understand if you find opening-up of the interfaces in the central
>>> repository unacceptable. We can always apply the patch at the OSG level
>>> where rpms for all our deployments get built.
>>>
>>> Thanks & Best regards,
>>> Matevz
>>>
>>>
>>>> On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost <jdost@ucsd.edu
>>>> <ma...@ucsd.edu>> wrote:
>>>>
>>>>      Thank you for the suggestion, but we don't see how simply wrapping
>>>> a
>>>>      FileSystem object would be sufficient in our use case.  The reason
>>>> why
>>>> is we
>>>>      need to catch and handle read exceptions at the block level.  There
>>>> aren't
>>>>      any public methods available in the high level FileSystem
>>>> abstraction
>>>> layer
>>>>      that would give us the fine grained control we need at block level
>>>> read
>>>>      failures.
>>>>
>>>>      Perhaps if I outline the steps more clearly it will help explain
>>>> what
>>>> we are
>>>>      trying to do.  Without our enhancements, suppose a user opens a
>>>> file
>>>> stream
>>>>      and starts reading the file from Hadoop. After some time, at some
>>>> position
>>>>      into the file, if there happen to be no replicas available for a
>>>> particular
>>>>      block for whatever reason, datanodes have gone down due to disk
>>>> issues, etc.
>>>>      the stream will throw an IOException (BlockMissingException or
>>>> similar) and
>>>>      the read will fail.
>>>>
>>>>      What we are doing is rather than letting the stream fail, we have
>>>> another
>>>>      stream queued up that knows how to fetch the blocks elsewhere
>>>> outside
>>>> of our
>>>>      Hadoop cluster that couldn't be retrieved.  So we need to be able
>>>> to
>>>> catch
>>>>      the exception at this point, and these externally fetched bytes
>>>> then
>>>> get
>>>>      read into the user supplied read buffer.  Now Hadoop can proceed to
>>>> read in
>>>>      the stream the next blocks in the file.
>>>>
>>>>      So as you can see this method of fail over on demand allows an
>>>> input
>>>> stream
>>>>      to keep reading data, without having to start it all over again if
>>>> a
>>>> failure
>>>>      occurs (assuming the remote bytes were successfully fetched).
>>>>
>>>>      As a final note I would like to mention that we will be providing
>>>> our
>>>>      failover module to the Open Science Grid.  Since we hope to provide
>>>> this as
>>>>      a benefit to all OSG users running at participating T2 computing
>>>> clusters,
>>>>      we will be committed to maintaining this software and any changes
>>>> to
>>>> Hadoop
>>>>      needed to make it work.  In other words we will be willing to
>>>> maintain
>>>> any
>>>>      implementation changes that may become necessary as Hadoop
>>>> internals
>>>> change
>>>>      in future releases.
>>>>
>>>>      Thanks,
>>>>      Jeff
>>>>
>>>>
>>>>      On 8/7/13 11:30 AM, Andrew Wang wrote:
>>>>
>>>>          I don't think exposing DFSClient and DistributedFileSystem
>>>> members
>>>> is
>>>>          necessary to achieve what you're trying to do. We've got
>>>> wrapper
>>>>          FileSystems like FilterFileSystem and ViewFileSystem which you
>>>> might be
>>>>          able to use for inspiration, and the HCFS wiki lists some
>>>> third-party
>>>>          FileSystems that might also be helpful too.
>>>>
>>>>
>>>>          On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jbounour@ddn.com
>>>>          <ma...@ddn.com>> wrote:
>>>>
>>>>              Hello Jeff
>>>>
>>>>              Is it something that could go under HCFS project?
>>>>              http://wiki.apache.org/hadoop/**__HCFS<http://wiki.apache.org/hadoop/__HCFS>
>>>>
>>>>              <http://wiki.apache.org/**hadoop/HCFS<http://wiki.apache.org/hadoop/HCFS>
>>>> >
>>>>              (I might be wrong?)
>>>>
>>>>              Joe
>>>>
>>>>
>>>>              On 8/7/13 10:59 AM, "Jeff Dost" <jdost@ucsd.edu
>>>>              <ma...@ucsd.edu>> wrote:
>>>>
>>>>                  Hello,
>>>>
>>>>                  We work in a software development team at the UCSD CMS
>>>> Tier2
>>>>                  Center.  We
>>>>                  would like to propose a mechanism to allow one to
>>>> subclass
>>>> the
>>>>                  DFSInputStream in a clean way from an external package.
>>>> First
>>>>                  I'd like
>>>>                  to give some motivation on why and then will proceed
>>>> with
>>>> the
>>>>                  details.
>>>>
>>>>                  We have a 3 Petabyte Hadoop cluster we maintain for the
>>>> LHC
>>>>                  experiment
>>>>                  at CERN.  There are other T2 centers worldwide that
>>>> contain
>>>>                  mirrors of
>>>>                  the same data we host.  We are working on an extension
>>>> to
>>>> Hadoop
>>>>                  that,
>>>>                  on reading a file, if it is found that there are no
>>>> available
>>>>                  replicas
>>>>                  of a block, we use an external interface to retrieve
>>>> this
>>>> block
>>>>                  of the
>>>>                  file from another data center.  The external interface
>>>> is
>>>> necessary
>>>>                  because not all T2 centers involved in CMS are running
>>>> a
>>>> Hadoop
>>>>                  cluster
>>>>                  as their storage backend.
>>>>
>>>>                  In order to implement this functionality, we need to
>>>> subclass the
>>>>                  DFSInputStream and override the read method, so we can
>>>> catch
>>>>                  IOExceptions that occur on client reads at the block
>>>> level.
>>>>
>>>>                  The basic steps required:
>>>>                  1. Invent a new URI scheme for the customized
>>>> "FileSystem"
>>>> in
>>>>                  core-site.xml:
>>>>                      <property>
>>>>                        <name>fs.foofs.impl</name>
>>>>                        <value>my.package.__**FooFileSystem</value>
>>>>
>>>>                        <description>My Extended FileSystem for foofs:
>>>>                  uris.</description>
>>>>                      </property>
>>>>
>>>>                  2. Write new classes included in the external package
>>>> that
>>>>                  subclass the
>>>>                  following:
>>>>                  FooFileSystem subclasses DistributedFileSystem
>>>>                  FooFSClient subclasses DFSClient
>>>>                  FooFSInputStream subclasses DFSInputStream
>>>>
>>>>                  Now any client commands that explicitly use the
>>>> foofs://
>>>> scheme
>>>>                  in paths
>>>>                  to access the hadoop cluster can open files with a
>>>> customized
>>>>                  InputStream that extends functionality of the default
>>>> hadoop client
>>>>                  DFSInputStream.  In order to make this happen for our
>>>> use
>>>> case,
>>>>                  we had
>>>>                  to change some access modifiers in the
>>>> DistributedFileSystem,
>>>>                  DFSClient,
>>>>                  and DFSInputStream classes provided by Hadoop.  In
>>>> addition, we
>>>>                  had to
>>>>                  comment out the check in the namenode code that only
>>>> allows for URI
>>>>                  schemes of the form "hdfs://".
>>>>
>>>>                  Attached is a patch file we apply to hadoop.  Note
>>>> that we
>>>>                  derived this
>>>>                  patch by modding the Cloudera release
>>>> hadoop-2.0.0-cdh4.1.1
>>>>                  which can be
>>>>                  found at:
>>>>
>>>> http://archive.cloudera.com/__**cdh4/cdh/4/hadoop-2.0.0-cdh4._**
>>>> _1.1.tar.gz<http://archive.cloudera.com/__cdh4/cdh/4/hadoop-2.0.0-cdh4.__1.1.tar.gz>
>>>>
>>>>
>>>> <http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**
>>>> 1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>>>> >
>>>>
>>>>                  We would greatly appreciate any advise on whether or
>>>> not
>>>> this
>>>>                  approach
>>>>                  sounds reasonable, and if you would consider accepting
>>>> these
>>>>                  modifications into the official Hadoop code base.
>>>>
>>>>                  Thank you,
>>>>                  Jeff, Alja & Matevz
>>>>                  UCSD Physics
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>


-- 
http://hortonworks.com/download/

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Matevz Tadel <mt...@ucsd.edu>.

Thanks Colin, I subscribed to HDFS-2832 so that I can follow the development 
there. I assume this is targeting release 2.1.

Best,
Matevz


On 08/08/13 12:10, Colin McCabe wrote:
> There is work underway to decouple the block layer and the namespace
> layer of HDFS from each other.  Once this is done, block behaviors
> like the one you describe will be easy to implement.  It's a use case
> very similar to the hierarchical storage management (HSM) use case
> that we've discussed before.  Check out HDFS-2832.  Hopefully there
> will be a design document posted soon.
>
> cheers,
> Colin
>
>
> On Thu, Aug 8, 2013 at 11:52 AM, Matevz Tadel <mt...@ucsd.edu> wrote:
>> Hi everybody,
>>
>> I'm jumping in as Jeff is away due to an unexpected annoyance involving
>> Californian wildlife.
>>
>>
>> On 8/7/13 7:47 PM, Andrew Wang wrote:
>>>
>>> Blocks are supposed to be an internal abstraction within HDFS, and aren't
>>> an
>>> inherent part of FileSystem (the user-visible class used to access all
>>> Hadoop
>>> filesystems).
>>
>>
>> Yes, but it's a really useful abstraction :) Do you really believe the
>> blocks could be abandoned in the next couple of years? I mean, it's such a
>> simple and effective solution ...
>>
>>
>>> Is it possible to instead deal with files and offsets? On a read failure,
>>> you
>>> could open a stream to the same file on the backup filesystem, seek to the
>>> old
>>> file position, and retry the read. This feels like it's possible via
>>> wrapping.
>>
>>
>> As Jeff briefly mentioned, all USCMS sites export their data via XRootd (not
>> all of them use HDFS!) and we developed a specialization of XRootd caching
>> proxy that can fetch only requested blocks (block size is passed from our
>> input stream class to XRootd client (via JNI) and on to the proxy server)
>> and keep them in a local cache. This allows as to do three things:
>>
>> 1. the first time we notice a block is missing, a whole block is fetched
>> from elsewhere and further access requests from the same process get
>> fulfilled with zero latency;
>>
>> 2. later requests from other processes asking for this block are fulfilled
>> immediately (well, after the initial 3 retries);
>>
>> 3. we have a list of blocks that were fetched and we could (this is what we
>> want to look into in the near future) re-inject them into HDFS if the data
>> loss turns out to be permanent (bad disk vs. node that was
>> offline/overloaded for a while).
>>
>> Handling exceptions at the block level thus gives us just what we need. As
>> input stream is the place where these errors become known it is, I think,
>> also the easiest place to handle them.
>>
>> I'll understand if you find opening-up of the interfaces in the central
>> repository unacceptable. We can always apply the patch at the OSG level
>> where rpms for all our deployments get built.
>>
>> Thanks & Best regards,
>> Matevz
>>
>>>
>>> On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost <jdost@ucsd.edu
>>> <ma...@ucsd.edu>> wrote:
>>>
>>>      Thank you for the suggestion, but we don't see how simply wrapping a
>>>      FileSystem object would be sufficient in our use case.  The reason why
>>> is we
>>>      need to catch and handle read exceptions at the block level.  There
>>> aren't
>>>      any public methods available in the high level FileSystem abstraction
>>> layer
>>>      that would give us the fine grained control we need at block level
>>> read
>>>      failures.
>>>
>>>      Perhaps if I outline the steps more clearly it will help explain what
>>> we are
>>>      trying to do.  Without our enhancements, suppose a user opens a file
>>> stream
>>>      and starts reading the file from Hadoop. After some time, at some
>>> position
>>>      into the file, if there happen to be no replicas available for a
>>> particular
>>>      block for whatever reason, datanodes have gone down due to disk
>>> issues, etc.
>>>      the stream will throw an IOException (BlockMissingException or
>>> similar) and
>>>      the read will fail.
>>>
>>>      What we are doing is rather than letting the stream fail, we have
>>> another
>>>      stream queued up that knows how to fetch the blocks elsewhere outside
>>> of our
>>>      Hadoop cluster that couldn't be retrieved.  So we need to be able to
>>> catch
>>>      the exception at this point, and these externally fetched bytes then
>>> get
>>>      read into the user supplied read buffer.  Now Hadoop can proceed to
>>> read in
>>>      the stream the next blocks in the file.
>>>
>>>      So as you can see this method of fail over on demand allows an input
>>> stream
>>>      to keep reading data, without having to start it all over again if a
>>> failure
>>>      occurs (assuming the remote bytes were successfully fetched).
>>>
>>>      As a final note I would like to mention that we will be providing our
>>>      failover module to the Open Science Grid.  Since we hope to provide
>>> this as
>>>      a benefit to all OSG users running at participating T2 computing
>>> clusters,
>>>      we will be committed to maintaining this software and any changes to
>>> Hadoop
>>>      needed to make it work.  In other words we will be willing to maintain
>>> any
>>>      implementation changes that may become necessary as Hadoop internals
>>> change
>>>      in future releases.
>>>
>>>      Thanks,
>>>      Jeff
>>>
>>>
>>>      On 8/7/13 11:30 AM, Andrew Wang wrote:
>>>
>>>          I don't think exposing DFSClient and DistributedFileSystem members
>>> is
>>>          necessary to achieve what you're trying to do. We've got wrapper
>>>          FileSystems like FilterFileSystem and ViewFileSystem which you
>>> might be
>>>          able to use for inspiration, and the HCFS wiki lists some
>>> third-party
>>>          FileSystems that might also be helpful too.
>>>
>>>
>>>          On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jbounour@ddn.com
>>>          <ma...@ddn.com>> wrote:
>>>
>>>              Hello Jeff
>>>
>>>              Is it something that could go under HCFS project?
>>>              http://wiki.apache.org/hadoop/__HCFS
>>>
>>>              <http://wiki.apache.org/hadoop/HCFS>
>>>              (I might be wrong?)
>>>
>>>              Joe
>>>
>>>
>>>              On 8/7/13 10:59 AM, "Jeff Dost" <jdost@ucsd.edu
>>>              <ma...@ucsd.edu>> wrote:
>>>
>>>                  Hello,
>>>
>>>                  We work in a software development team at the UCSD CMS
>>> Tier2
>>>                  Center.  We
>>>                  would like to propose a mechanism to allow one to subclass
>>> the
>>>                  DFSInputStream in a clean way from an external package.
>>> First
>>>                  I'd like
>>>                  to give some motivation on why and then will proceed with
>>> the
>>>                  details.
>>>
>>>                  We have a 3 Petabyte Hadoop cluster we maintain for the
>>> LHC
>>>                  experiment
>>>                  at CERN.  There are other T2 centers worldwide that
>>> contain
>>>                  mirrors of
>>>                  the same data we host.  We are working on an extension to
>>> Hadoop
>>>                  that,
>>>                  on reading a file, if it is found that there are no
>>> available
>>>                  replicas
>>>                  of a block, we use an external interface to retrieve this
>>> block
>>>                  of the
>>>                  file from another data center.  The external interface is
>>> necessary
>>>                  because not all T2 centers involved in CMS are running a
>>> Hadoop
>>>                  cluster
>>>                  as their storage backend.
>>>
>>>                  In order to implement this functionality, we need to
>>> subclass the
>>>                  DFSInputStream and override the read method, so we can
>>> catch
>>>                  IOExceptions that occur on client reads at the block
>>> level.
>>>
>>>                  The basic steps required:
>>>                  1. Invent a new URI scheme for the customized "FileSystem"
>>> in
>>>                  core-site.xml:
>>>                      <property>
>>>                        <name>fs.foofs.impl</name>
>>>                        <value>my.package.__FooFileSystem</value>
>>>
>>>                        <description>My Extended FileSystem for foofs:
>>>                  uris.</description>
>>>                      </property>
>>>
>>>                  2. Write new classes included in the external package that
>>>                  subclass the
>>>                  following:
>>>                  FooFileSystem subclasses DistributedFileSystem
>>>                  FooFSClient subclasses DFSClient
>>>                  FooFSInputStream subclasses DFSInputStream
>>>
>>>                  Now any client commands that explicitly use the foofs://
>>> scheme
>>>                  in paths
>>>                  to access the hadoop cluster can open files with a
>>> customized
>>>                  InputStream that extends functionality of the default
>>> hadoop client
>>>                  DFSInputStream.  In order to make this happen for our use
>>> case,
>>>                  we had
>>>                  to change some access modifiers in the
>>> DistributedFileSystem,
>>>                  DFSClient,
>>>                  and DFSInputStream classes provided by Hadoop.  In
>>> addition, we
>>>                  had to
>>>                  comment out the check in the namenode code that only
>>> allows for URI
>>>                  schemes of the form "hdfs://".
>>>
>>>                  Attached is a patch file we apply to hadoop.  Note that we
>>>                  derived this
>>>                  patch by modding the Cloudera release
>>> hadoop-2.0.0-cdh4.1.1
>>>                  which can be
>>>                  found at:
>>>
>>> http://archive.cloudera.com/__cdh4/cdh/4/hadoop-2.0.0-cdh4.__1.1.tar.gz
>>>
>>>
>>> <http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>>>
>>>                  We would greatly appreciate any advise on whether or not
>>> this
>>>                  approach
>>>                  sounds reasonable, and if you would consider accepting
>>> these
>>>                  modifications into the official Hadoop code base.
>>>
>>>                  Thank you,
>>>                  Jeff, Alja & Matevz
>>>                  UCSD Physics
>>>
>>>
>>>
>>>
>>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

There is work underway to decouple the block layer and the namespace
layer of HDFS from each other.  Once this is done, block behaviors
like the one you describe will be easy to implement.  It's a use case
very similar to the hierarchical storage management (HSM) use case
that we've discussed before.  Check out HDFS-2832.  Hopefully there
will be a design document posted soon.

cheers,
Colin


On Thu, Aug 8, 2013 at 11:52 AM, Matevz Tadel <mt...@ucsd.edu> wrote:
> Hi everybody,
>
> I'm jumping in as Jeff is away due to an unexpected annoyance involving
> Californian wildlife.
>
>
> On 8/7/13 7:47 PM, Andrew Wang wrote:
>>
>> Blocks are supposed to be an internal abstraction within HDFS, and aren't
>> an
>> inherent part of FileSystem (the user-visible class used to access all
>> Hadoop
>> filesystems).
>
>
> Yes, but it's a really useful abstraction :) Do you really believe the
> blocks could be abandoned in the next couple of years? I mean, it's such a
> simple and effective solution ...
>
>
>> Is it possible to instead deal with files and offsets? On a read failure,
>> you
>> could open a stream to the same file on the backup filesystem, seek to the
>> old
>> file position, and retry the read. This feels like it's possible via
>> wrapping.
>
>
> As Jeff briefly mentioned, all USCMS sites export their data via XRootd (not
> all of them use HDFS!) and we developed a specialization of XRootd caching
> proxy that can fetch only requested blocks (block size is passed from our
> input stream class to XRootd client (via JNI) and on to the proxy server)
> and keep them in a local cache. This allows as to do three things:
>
> 1. the first time we notice a block is missing, a whole block is fetched
> from elsewhere and further access requests from the same process get
> fulfilled with zero latency;
>
> 2. later requests from other processes asking for this block are fulfilled
> immediately (well, after the initial 3 retries);
>
> 3. we have a list of blocks that were fetched and we could (this is what we
> want to look into in the near future) re-inject them into HDFS if the data
> loss turns out to be permanent (bad disk vs. node that was
> offline/overloaded for a while).
>
> Handling exceptions at the block level thus gives us just what we need. As
> input stream is the place where these errors become known it is, I think,
> also the easiest place to handle them.
>
> I'll understand if you find opening-up of the interfaces in the central
> repository unacceptable. We can always apply the patch at the OSG level
> where rpms for all our deployments get built.
>
> Thanks & Best regards,
> Matevz
>
>>
>> On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost <jdost@ucsd.edu
>> <ma...@ucsd.edu>> wrote:
>>
>>     Thank you for the suggestion, but we don't see how simply wrapping a
>>     FileSystem object would be sufficient in our use case.  The reason why
>> is we
>>     need to catch and handle read exceptions at the block level.  There
>> aren't
>>     any public methods available in the high level FileSystem abstraction
>> layer
>>     that would give us the fine grained control we need at block level
>> read
>>     failures.
>>
>>     Perhaps if I outline the steps more clearly it will help explain what
>> we are
>>     trying to do.  Without our enhancements, suppose a user opens a file
>> stream
>>     and starts reading the file from Hadoop. After some time, at some
>> position
>>     into the file, if there happen to be no replicas available for a
>> particular
>>     block for whatever reason, datanodes have gone down due to disk
>> issues, etc.
>>     the stream will throw an IOException (BlockMissingException or
>> similar) and
>>     the read will fail.
>>
>>     What we are doing is rather than letting the stream fail, we have
>> another
>>     stream queued up that knows how to fetch the blocks elsewhere outside
>> of our
>>     Hadoop cluster that couldn't be retrieved.  So we need to be able to
>> catch
>>     the exception at this point, and these externally fetched bytes then
>> get
>>     read into the user supplied read buffer.  Now Hadoop can proceed to
>> read in
>>     the stream the next blocks in the file.
>>
>>     So as you can see this method of fail over on demand allows an input
>> stream
>>     to keep reading data, without having to start it all over again if a
>> failure
>>     occurs (assuming the remote bytes were successfully fetched).
>>
>>     As a final note I would like to mention that we will be providing our
>>     failover module to the Open Science Grid.  Since we hope to provide
>> this as
>>     a benefit to all OSG users running at participating T2 computing
>> clusters,
>>     we will be committed to maintaining this software and any changes to
>> Hadoop
>>     needed to make it work.  In other words we will be willing to maintain
>> any
>>     implementation changes that may become necessary as Hadoop internals
>> change
>>     in future releases.
>>
>>     Thanks,
>>     Jeff
>>
>>
>>     On 8/7/13 11:30 AM, Andrew Wang wrote:
>>
>>         I don't think exposing DFSClient and DistributedFileSystem members
>> is
>>         necessary to achieve what you're trying to do. We've got wrapper
>>         FileSystems like FilterFileSystem and ViewFileSystem which you
>> might be
>>         able to use for inspiration, and the HCFS wiki lists some
>> third-party
>>         FileSystems that might also be helpful too.
>>
>>
>>         On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jbounour@ddn.com
>>         <ma...@ddn.com>> wrote:
>>
>>             Hello Jeff
>>
>>             Is it something that could go under HCFS project?
>>             http://wiki.apache.org/hadoop/__HCFS
>>
>>             <http://wiki.apache.org/hadoop/HCFS>
>>             (I might be wrong?)
>>
>>             Joe
>>
>>
>>             On 8/7/13 10:59 AM, "Jeff Dost" <jdost@ucsd.edu
>>             <ma...@ucsd.edu>> wrote:
>>
>>                 Hello,
>>
>>                 We work in a software development team at the UCSD CMS
>> Tier2
>>                 Center.  We
>>                 would like to propose a mechanism to allow one to subclass
>> the
>>                 DFSInputStream in a clean way from an external package.
>> First
>>                 I'd like
>>                 to give some motivation on why and then will proceed with
>> the
>>                 details.
>>
>>                 We have a 3 Petabyte Hadoop cluster we maintain for the
>> LHC
>>                 experiment
>>                 at CERN.  There are other T2 centers worldwide that
>> contain
>>                 mirrors of
>>                 the same data we host.  We are working on an extension to
>> Hadoop
>>                 that,
>>                 on reading a file, if it is found that there are no
>> available
>>                 replicas
>>                 of a block, we use an external interface to retrieve this
>> block
>>                 of the
>>                 file from another data center.  The external interface is
>> necessary
>>                 because not all T2 centers involved in CMS are running a
>> Hadoop
>>                 cluster
>>                 as their storage backend.
>>
>>                 In order to implement this functionality, we need to
>> subclass the
>>                 DFSInputStream and override the read method, so we can
>> catch
>>                 IOExceptions that occur on client reads at the block
>> level.
>>
>>                 The basic steps required:
>>                 1. Invent a new URI scheme for the customized "FileSystem"
>> in
>>                 core-site.xml:
>>                     <property>
>>                       <name>fs.foofs.impl</name>
>>                       <value>my.package.__FooFileSystem</value>
>>
>>                       <description>My Extended FileSystem for foofs:
>>                 uris.</description>
>>                     </property>
>>
>>                 2. Write new classes included in the external package that
>>                 subclass the
>>                 following:
>>                 FooFileSystem subclasses DistributedFileSystem
>>                 FooFSClient subclasses DFSClient
>>                 FooFSInputStream subclasses DFSInputStream
>>
>>                 Now any client commands that explicitly use the foofs://
>> scheme
>>                 in paths
>>                 to access the hadoop cluster can open files with a
>> customized
>>                 InputStream that extends functionality of the default
>> hadoop client
>>                 DFSInputStream.  In order to make this happen for our use
>> case,
>>                 we had
>>                 to change some access modifiers in the
>> DistributedFileSystem,
>>                 DFSClient,
>>                 and DFSInputStream classes provided by Hadoop.  In
>> addition, we
>>                 had to
>>                 comment out the check in the namenode code that only
>> allows for URI
>>                 schemes of the form "hdfs://".
>>
>>                 Attached is a patch file we apply to hadoop.  Note that we
>>                 derived this
>>                 patch by modding the Cloudera release
>> hadoop-2.0.0-cdh4.1.1
>>                 which can be
>>                 found at:
>>
>> http://archive.cloudera.com/__cdh4/cdh/4/hadoop-2.0.0-cdh4.__1.1.tar.gz
>>
>>
>> <http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>>
>>                 We would greatly appreciate any advise on whether or not
>> this
>>                 approach
>>                 sounds reasonable, and if you would consider accepting
>> these
>>                 modifications into the official Hadoop code base.
>>
>>                 Thank you,
>>                 Jeff, Alja & Matevz
>>                 UCSD Physics
>>
>>
>>
>>
>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Matevz Tadel <mt...@ucsd.edu>.

Hi everybody,

I'm jumping in as Jeff is away due to an unexpected annoyance involving 
Californian wildlife.

On 8/7/13 7:47 PM, Andrew Wang wrote:
> Blocks are supposed to be an internal abstraction within HDFS, and aren't an
> inherent part of FileSystem (the user-visible class used to access all Hadoop
> filesystems).

Yes, but it's a really useful abstraction :) Do you really believe the blocks 
could be abandoned in the next couple of years? I mean, it's such a simple and 
effective solution ...

> Is it possible to instead deal with files and offsets? On a read failure, you
> could open a stream to the same file on the backup filesystem, seek to the old
> file position, and retry the read. This feels like it's possible via wrapping.

As Jeff briefly mentioned, all USCMS sites export their data via XRootd (not all 
of them use HDFS!) and we developed a specialization of XRootd caching proxy 
that can fetch only requested blocks (block size is passed from our input stream 
class to XRootd client (via JNI) and on to the proxy server) and keep them in a 
local cache. This allows as to do three things:

1. the first time we notice a block is missing, a whole block is fetched from 
elsewhere and further access requests from the same process get fulfilled with 
zero latency;

2. later requests from other processes asking for this block are fulfilled 
immediately (well, after the initial 3 retries);

3. we have a list of blocks that were fetched and we could (this is what we want 
to look into in the near future) re-inject them into HDFS if the data loss turns 
out to be permanent (bad disk vs. node that was offline/overloaded for a while).

Handling exceptions at the block level thus gives us just what we need. As input 
stream is the place where these errors become known it is, I think, also the 
easiest place to handle them.

I'll understand if you find opening-up of the interfaces in the central 
repository unacceptable. We can always apply the patch at the OSG level where 
rpms for all our deployments get built.

Thanks & Best regards,
Matevz

>
> On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost <jdost@ucsd.edu
> <ma...@ucsd.edu>> wrote:
>
>     Thank you for the suggestion, but we don't see how simply wrapping a
>     FileSystem object would be sufficient in our use case.  The reason why is we
>     need to catch and handle read exceptions at the block level.  There aren't
>     any public methods available in the high level FileSystem abstraction layer
>     that would give us the fine grained control we need at block level read
>     failures.
>
>     Perhaps if I outline the steps more clearly it will help explain what we are
>     trying to do.  Without our enhancements, suppose a user opens a file stream
>     and starts reading the file from Hadoop. After some time, at some position
>     into the file, if there happen to be no replicas available for a particular
>     block for whatever reason, datanodes have gone down due to disk issues, etc.
>     the stream will throw an IOException (BlockMissingException or similar) and
>     the read will fail.
>
>     What we are doing is rather than letting the stream fail, we have another
>     stream queued up that knows how to fetch the blocks elsewhere outside of our
>     Hadoop cluster that couldn't be retrieved.  So we need to be able to catch
>     the exception at this point, and these externally fetched bytes then get
>     read into the user supplied read buffer.  Now Hadoop can proceed to read in
>     the stream the next blocks in the file.
>
>     So as you can see this method of fail over on demand allows an input stream
>     to keep reading data, without having to start it all over again if a failure
>     occurs (assuming the remote bytes were successfully fetched).
>
>     As a final note I would like to mention that we will be providing our
>     failover module to the Open Science Grid.  Since we hope to provide this as
>     a benefit to all OSG users running at participating T2 computing clusters,
>     we will be committed to maintaining this software and any changes to Hadoop
>     needed to make it work.  In other words we will be willing to maintain any
>     implementation changes that may become necessary as Hadoop internals change
>     in future releases.
>
>     Thanks,
>     Jeff
>
>
>     On 8/7/13 11:30 AM, Andrew Wang wrote:
>
>         I don't think exposing DFSClient and DistributedFileSystem members is
>         necessary to achieve what you're trying to do. We've got wrapper
>         FileSystems like FilterFileSystem and ViewFileSystem which you might be
>         able to use for inspiration, and the HCFS wiki lists some third-party
>         FileSystems that might also be helpful too.
>
>
>         On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jbounour@ddn.com
>         <ma...@ddn.com>> wrote:
>
>             Hello Jeff
>
>             Is it something that could go under HCFS project?
>             http://wiki.apache.org/hadoop/__HCFS
>             <http://wiki.apache.org/hadoop/HCFS>
>             (I might be wrong?)
>
>             Joe
>
>
>             On 8/7/13 10:59 AM, "Jeff Dost" <jdost@ucsd.edu
>             <ma...@ucsd.edu>> wrote:
>
>                 Hello,
>
>                 We work in a software development team at the UCSD CMS Tier2
>                 Center.  We
>                 would like to propose a mechanism to allow one to subclass the
>                 DFSInputStream in a clean way from an external package.  First
>                 I'd like
>                 to give some motivation on why and then will proceed with the
>                 details.
>
>                 We have a 3 Petabyte Hadoop cluster we maintain for the LHC
>                 experiment
>                 at CERN.  There are other T2 centers worldwide that contain
>                 mirrors of
>                 the same data we host.  We are working on an extension to Hadoop
>                 that,
>                 on reading a file, if it is found that there are no available
>                 replicas
>                 of a block, we use an external interface to retrieve this block
>                 of the
>                 file from another data center.  The external interface is necessary
>                 because not all T2 centers involved in CMS are running a Hadoop
>                 cluster
>                 as their storage backend.
>
>                 In order to implement this functionality, we need to subclass the
>                 DFSInputStream and override the read method, so we can catch
>                 IOExceptions that occur on client reads at the block level.
>
>                 The basic steps required:
>                 1. Invent a new URI scheme for the customized "FileSystem" in
>                 core-site.xml:
>                     <property>
>                       <name>fs.foofs.impl</name>
>                       <value>my.package.__FooFileSystem</value>
>                       <description>My Extended FileSystem for foofs:
>                 uris.</description>
>                     </property>
>
>                 2. Write new classes included in the external package that
>                 subclass the
>                 following:
>                 FooFileSystem subclasses DistributedFileSystem
>                 FooFSClient subclasses DFSClient
>                 FooFSInputStream subclasses DFSInputStream
>
>                 Now any client commands that explicitly use the foofs:// scheme
>                 in paths
>                 to access the hadoop cluster can open files with a customized
>                 InputStream that extends functionality of the default hadoop client
>                 DFSInputStream.  In order to make this happen for our use case,
>                 we had
>                 to change some access modifiers in the DistributedFileSystem,
>                 DFSClient,
>                 and DFSInputStream classes provided by Hadoop.  In addition, we
>                 had to
>                 comment out the check in the namenode code that only allows for URI
>                 schemes of the form "hdfs://".
>
>                 Attached is a patch file we apply to hadoop.  Note that we
>                 derived this
>                 patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1
>                 which can be
>                 found at:
>                 http://archive.cloudera.com/__cdh4/cdh/4/hadoop-2.0.0-cdh4.__1.1.tar.gz
>                 <http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>
>                 We would greatly appreciate any advise on whether or not this
>                 approach
>                 sounds reasonable, and if you would consider accepting these
>                 modifications into the official Hadoop code base.
>
>                 Thank you,
>                 Jeff, Alja & Matevz
>                 UCSD Physics
>
>
>
>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Andrew Wang <an...@cloudera.com>.

Blocks are supposed to be an internal abstraction within HDFS, and aren't
an inherent part of FileSystem (the user-visible class used to access all
Hadoop filesystems).

Is it possible to instead deal with files and offsets? On a read failure,
you could open a stream to the same file on the backup filesystem, seek to
the old file position, and retry the read. This feels like it's possible
via wrapping.


On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost <jd...@ucsd.edu> wrote:

> Thank you for the suggestion, but we don't see how simply wrapping a
> FileSystem object would be sufficient in our use case.  The reason why is
> we need to catch and handle read exceptions at the block level.  There
> aren't any public methods available in the high level FileSystem
> abstraction layer that would give us the fine grained control we need at
> block level read failures.
>
> Perhaps if I outline the steps more clearly it will help explain what we
> are trying to do.  Without our enhancements, suppose a user opens a file
> stream and starts reading the file from Hadoop. After some time, at some
> position into the file, if there happen to be no replicas available for a
> particular block for whatever reason, datanodes have gone down due to disk
> issues, etc. the stream will throw an IOException (BlockMissingException or
> similar) and the read will fail.
>
> What we are doing is rather than letting the stream fail, we have another
> stream queued up that knows how to fetch the blocks elsewhere outside of
> our Hadoop cluster that couldn't be retrieved.  So we need to be able to
> catch the exception at this point, and these externally fetched bytes then
> get read into the user supplied read buffer.  Now Hadoop can proceed to
> read in the stream the next blocks in the file.
>
> So as you can see this method of fail over on demand allows an input
> stream to keep reading data, without having to start it all over again if a
> failure occurs (assuming the remote bytes were successfully fetched).
>
> As a final note I would like to mention that we will be providing our
> failover module to the Open Science Grid.  Since we hope to provide this as
> a benefit to all OSG users running at participating T2 computing clusters,
> we will be committed to maintaining this software and any changes to Hadoop
> needed to make it work.  In other words we will be willing to maintain any
> implementation changes that may become necessary as Hadoop internals change
> in future releases.
>
> Thanks,
> Jeff
>
>
> On 8/7/13 11:30 AM, Andrew Wang wrote:
>
>> I don't think exposing DFSClient and DistributedFileSystem members is
>> necessary to achieve what you're trying to do. We've got wrapper
>> FileSystems like FilterFileSystem and ViewFileSystem which you might be
>> able to use for inspiration, and the HCFS wiki lists some third-party
>> FileSystems that might also be helpful too.
>>
>>
>> On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jb...@ddn.com> wrote:
>>
>>  Hello Jeff
>>>
>>> Is it something that could go under HCFS project?
>>> http://wiki.apache.org/hadoop/**HCFS<http://wiki.apache.org/hadoop/HCFS>
>>> (I might be wrong?)
>>>
>>> Joe
>>>
>>>
>>> On 8/7/13 10:59 AM, "Jeff Dost" <jd...@ucsd.edu> wrote:
>>>
>>>  Hello,
>>>>
>>>> We work in a software development team at the UCSD CMS Tier2 Center.  We
>>>> would like to propose a mechanism to allow one to subclass the
>>>> DFSInputStream in a clean way from an external package.  First I'd like
>>>> to give some motivation on why and then will proceed with the details.
>>>>
>>>> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment
>>>> at CERN.  There are other T2 centers worldwide that contain mirrors of
>>>> the same data we host.  We are working on an extension to Hadoop that,
>>>> on reading a file, if it is found that there are no available replicas
>>>> of a block, we use an external interface to retrieve this block of the
>>>> file from another data center.  The external interface is necessary
>>>> because not all T2 centers involved in CMS are running a Hadoop cluster
>>>> as their storage backend.
>>>>
>>>> In order to implement this functionality, we need to subclass the
>>>> DFSInputStream and override the read method, so we can catch
>>>> IOExceptions that occur on client reads at the block level.
>>>>
>>>> The basic steps required:
>>>> 1. Invent a new URI scheme for the customized "FileSystem" in
>>>> core-site.xml:
>>>>    <property>
>>>>      <name>fs.foofs.impl</name>
>>>>      <value>my.package.**FooFileSystem</value>
>>>>      <description>My Extended FileSystem for foofs: uris.</description>
>>>>    </property>
>>>>
>>>> 2. Write new classes included in the external package that subclass the
>>>> following:
>>>> FooFileSystem subclasses DistributedFileSystem
>>>> FooFSClient subclasses DFSClient
>>>> FooFSInputStream subclasses DFSInputStream
>>>>
>>>> Now any client commands that explicitly use the foofs:// scheme in paths
>>>> to access the hadoop cluster can open files with a customized
>>>> InputStream that extends functionality of the default hadoop client
>>>> DFSInputStream.  In order to make this happen for our use case, we had
>>>> to change some access modifiers in the DistributedFileSystem, DFSClient,
>>>> and DFSInputStream classes provided by Hadoop.  In addition, we had to
>>>> comment out the check in the namenode code that only allows for URI
>>>> schemes of the form "hdfs://".
>>>>
>>>> Attached is a patch file we apply to hadoop.  Note that we derived this
>>>> patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
>>>> found at:
>>>> http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>>>>
>>>> We would greatly appreciate any advise on whether or not this approach
>>>> sounds reasonable, and if you would consider accepting these
>>>> modifications into the official Hadoop code base.
>>>>
>>>> Thank you,
>>>> Jeff, Alja & Matevz
>>>> UCSD Physics
>>>>
>>>
>>>
>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Jeff Dost <jd...@ucsd.edu>.

Thank you for the suggestion, but we don't see how simply wrapping a 
FileSystem object would be sufficient in our use case.  The reason why 
is we need to catch and handle read exceptions at the block level.  
There aren't any public methods available in the high level FileSystem 
abstraction layer that would give us the fine grained control we need at 
block level read failures.

Perhaps if I outline the steps more clearly it will help explain what we 
are trying to do.  Without our enhancements, suppose a user opens a file 
stream and starts reading the file from Hadoop. After some time, at some 
position into the file, if there happen to be no replicas available for 
a particular block for whatever reason, datanodes have gone down due to 
disk issues, etc. the stream will throw an IOException 
(BlockMissingException or similar) and the read will fail.

What we are doing is rather than letting the stream fail, we have 
another stream queued up that knows how to fetch the blocks elsewhere 
outside of our Hadoop cluster that couldn't be retrieved.  So we need to 
be able to catch the exception at this point, and these externally 
fetched bytes then get read into the user supplied read buffer.  Now 
Hadoop can proceed to read in the stream the next blocks in the file.

So as you can see this method of fail over on demand allows an input 
stream to keep reading data, without having to start it all over again 
if a failure occurs (assuming the remote bytes were successfully fetched).

As a final note I would like to mention that we will be providing our 
failover module to the Open Science Grid.  Since we hope to provide this 
as a benefit to all OSG users running at participating T2 computing 
clusters, we will be committed to maintaining this software and any 
changes to Hadoop needed to make it work.  In other words we will be 
willing to maintain any implementation changes that may become necessary 
as Hadoop internals change in future releases.

Thanks,
Jeff

On 8/7/13 11:30 AM, Andrew Wang wrote:
> I don't think exposing DFSClient and DistributedFileSystem members is
> necessary to achieve what you're trying to do. We've got wrapper
> FileSystems like FilterFileSystem and ViewFileSystem which you might be
> able to use for inspiration, and the HCFS wiki lists some third-party
> FileSystems that might also be helpful too.
>
>
> On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jb...@ddn.com> wrote:
>
>> Hello Jeff
>>
>> Is it something that could go under HCFS project?
>> http://wiki.apache.org/hadoop/HCFS
>> (I might be wrong?)
>>
>> Joe
>>
>>
>> On 8/7/13 10:59 AM, "Jeff Dost" <jd...@ucsd.edu> wrote:
>>
>>> Hello,
>>>
>>> We work in a software development team at the UCSD CMS Tier2 Center.  We
>>> would like to propose a mechanism to allow one to subclass the
>>> DFSInputStream in a clean way from an external package.  First I'd like
>>> to give some motivation on why and then will proceed with the details.
>>>
>>> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment
>>> at CERN.  There are other T2 centers worldwide that contain mirrors of
>>> the same data we host.  We are working on an extension to Hadoop that,
>>> on reading a file, if it is found that there are no available replicas
>>> of a block, we use an external interface to retrieve this block of the
>>> file from another data center.  The external interface is necessary
>>> because not all T2 centers involved in CMS are running a Hadoop cluster
>>> as their storage backend.
>>>
>>> In order to implement this functionality, we need to subclass the
>>> DFSInputStream and override the read method, so we can catch
>>> IOExceptions that occur on client reads at the block level.
>>>
>>> The basic steps required:
>>> 1. Invent a new URI scheme for the customized "FileSystem" in
>>> core-site.xml:
>>>    <property>
>>>      <name>fs.foofs.impl</name>
>>>      <value>my.package.FooFileSystem</value>
>>>      <description>My Extended FileSystem for foofs: uris.</description>
>>>    </property>
>>>
>>> 2. Write new classes included in the external package that subclass the
>>> following:
>>> FooFileSystem subclasses DistributedFileSystem
>>> FooFSClient subclasses DFSClient
>>> FooFSInputStream subclasses DFSInputStream
>>>
>>> Now any client commands that explicitly use the foofs:// scheme in paths
>>> to access the hadoop cluster can open files with a customized
>>> InputStream that extends functionality of the default hadoop client
>>> DFSInputStream.  In order to make this happen for our use case, we had
>>> to change some access modifiers in the DistributedFileSystem, DFSClient,
>>> and DFSInputStream classes provided by Hadoop.  In addition, we had to
>>> comment out the check in the namenode code that only allows for URI
>>> schemes of the form "hdfs://".
>>>
>>> Attached is a patch file we apply to hadoop.  Note that we derived this
>>> patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
>>> found at:
>>> http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz
>>>
>>> We would greatly appreciate any advise on whether or not this approach
>>> sounds reasonable, and if you would consider accepting these
>>> modifications into the official Hadoop code base.
>>>
>>> Thank you,
>>> Jeff, Alja & Matevz
>>> UCSD Physics
>>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Andrew Wang <an...@cloudera.com>.

I don't think exposing DFSClient and DistributedFileSystem members is
necessary to achieve what you're trying to do. We've got wrapper
FileSystems like FilterFileSystem and ViewFileSystem which you might be
able to use for inspiration, and the HCFS wiki lists some third-party
FileSystems that might also be helpful too.


On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jb...@ddn.com> wrote:

> Hello Jeff
>
> Is it something that could go under HCFS project?
> http://wiki.apache.org/hadoop/HCFS
> (I might be wrong?)
>
> Joe
>
>
> On 8/7/13 10:59 AM, "Jeff Dost" <jd...@ucsd.edu> wrote:
>
> >Hello,
> >
> >We work in a software development team at the UCSD CMS Tier2 Center.  We
> >would like to propose a mechanism to allow one to subclass the
> >DFSInputStream in a clean way from an external package.  First I'd like
> >to give some motivation on why and then will proceed with the details.
> >
> >We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment
> >at CERN.  There are other T2 centers worldwide that contain mirrors of
> >the same data we host.  We are working on an extension to Hadoop that,
> >on reading a file, if it is found that there are no available replicas
> >of a block, we use an external interface to retrieve this block of the
> >file from another data center.  The external interface is necessary
> >because not all T2 centers involved in CMS are running a Hadoop cluster
> >as their storage backend.
> >
> >In order to implement this functionality, we need to subclass the
> >DFSInputStream and override the read method, so we can catch
> >IOExceptions that occur on client reads at the block level.
> >
> >The basic steps required:
> >1. Invent a new URI scheme for the customized "FileSystem" in
> >core-site.xml:
> >   <property>
> >     <name>fs.foofs.impl</name>
> >     <value>my.package.FooFileSystem</value>
> >     <description>My Extended FileSystem for foofs: uris.</description>
> >   </property>
> >
> >2. Write new classes included in the external package that subclass the
> >following:
> >FooFileSystem subclasses DistributedFileSystem
> >FooFSClient subclasses DFSClient
> >FooFSInputStream subclasses DFSInputStream
> >
> >Now any client commands that explicitly use the foofs:// scheme in paths
> >to access the hadoop cluster can open files with a customized
> >InputStream that extends functionality of the default hadoop client
> >DFSInputStream.  In order to make this happen for our use case, we had
> >to change some access modifiers in the DistributedFileSystem, DFSClient,
> >and DFSInputStream classes provided by Hadoop.  In addition, we had to
> >comment out the check in the namenode code that only allows for URI
> >schemes of the form "hdfs://".
> >
> >Attached is a patch file we apply to hadoop.  Note that we derived this
> >patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
> >found at:
> >http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz
> >
> >We would greatly appreciate any advise on whether or not this approach
> >sounds reasonable, and if you would consider accepting these
> >modifications into the official Hadoop code base.
> >
> >Thank you,
> >Jeff, Alja & Matevz
> >UCSD Physics
>
>

Re: Feature request to provide DFSInputStream subclassing mechanism

Posted by Joe Bounour <jb...@ddn.com>.

Hello Jeff

Is it something that could go under HCFS project?
http://wiki.apache.org/hadoop/HCFS
(I might be wrong?)

Joe


On 8/7/13 10:59 AM, "Jeff Dost" <jd...@ucsd.edu> wrote:

>Hello,
>
>We work in a software development team at the UCSD CMS Tier2 Center.  We
>would like to propose a mechanism to allow one to subclass the
>DFSInputStream in a clean way from an external package.  First I'd like
>to give some motivation on why and then will proceed with the details.
>
>We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment
>at CERN.  There are other T2 centers worldwide that contain mirrors of
>the same data we host.  We are working on an extension to Hadoop that,
>on reading a file, if it is found that there are no available replicas
>of a block, we use an external interface to retrieve this block of the
>file from another data center.  The external interface is necessary
>because not all T2 centers involved in CMS are running a Hadoop cluster
>as their storage backend.
>
>In order to implement this functionality, we need to subclass the
>DFSInputStream and override the read method, so we can catch
>IOExceptions that occur on client reads at the block level.
>
>The basic steps required:
>1. Invent a new URI scheme for the customized "FileSystem" in
>core-site.xml:
>   <property>
>     <name>fs.foofs.impl</name>
>     <value>my.package.FooFileSystem</value>
>     <description>My Extended FileSystem for foofs: uris.</description>
>   </property>
>
>2. Write new classes included in the external package that subclass the
>following:
>FooFileSystem subclasses DistributedFileSystem
>FooFSClient subclasses DFSClient
>FooFSInputStream subclasses DFSInputStream
>
>Now any client commands that explicitly use the foofs:// scheme in paths
>to access the hadoop cluster can open files with a customized
>InputStream that extends functionality of the default hadoop client
>DFSInputStream.  In order to make this happen for our use case, we had
>to change some access modifiers in the DistributedFileSystem, DFSClient,
>and DFSInputStream classes provided by Hadoop.  In addition, we had to
>comment out the check in the namenode code that only allows for URI
>schemes of the form "hdfs://".
>
>Attached is a patch file we apply to hadoop.  Note that we derived this
>patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
>found at:
>http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz
>
>We would greatly appreciate any advise on whether or not this approach
>sounds reasonable, and if you would consider accepting these
>modifications into the official Hadoop code base.
>
>Thank you,
>Jeff, Alja & Matevz
>UCSD Physics