You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Wei-Chiu Chuang <we...@apache.org> on 2023/03/16 20:54:44 UTC

[DISCUSS] Move HDFS specific APIs to FileSystem abstration

Hi,

Stephen and I are working on a project to make HBase to run on Ozone.

HBase, born out of the Hadoop project, depends on a number of HDFS specific
APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
strongly voiced that they don't want the project to have direct dependency
on additional FS implementations due to dependency and vulnerability
management concerns.

To make this project successful, we're exploring options, to push up these
APIs to the FileSystem abstraction. Eventually, it would make HBase FS
implementation agnostic, and perhaps enable HBase to support other storage
systems in the future.

We'd use the PathCapabilities API to probe if the underlying FS
implementation supports these APIs, and would then invoke the corresponding
FileSystem APIs. This is straightforward but the FileSystem would become
bloated.

Another option is to create a "RecoverableFileSystem" interface, and have
both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
way the impact to the Hadoop project and the FileSystem abstraction is even
smaller.

Thoughts?

[1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Tsz Wo Sze <sz...@yahoo.com.INVALID>.
 It makes a lot sense to use PathCapabilities.
Reflection is a hack but not a solution.

Tsz-Wo    On Tuesday, March 21, 2023, 09:43:15 AM GMT+8, Ayush Saxena <ay...@gmail.com> wrote:  
 
 Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase having resistance against the people proposing or against Ozone’

Anyway doesn’t ViewDistributedFileSystem not solve this Ozone problem, I remember Uma chasing that to solve these problems only?

Pulling up the core HDFS API honestly looks a naive approach, there is some work around reflection for DistCp with snapahots to work with Ozone, Hbase folks could have used that as well(https://issues.apache.org/jira/browse/HDFS-16911)

Juzz my thoughts on solving the problem, which I feel can be easily solved by writing a Util class in Hbase with some reflection logics…


-Ayush

> On 20-Mar-2023, at 9:54 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> 
> Thank you. Makes sense to me. Yes, as part of this effort we are going to
> need contract tests.
> 
>> On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <st...@cloudera.com.invalid>
>> wrote:
>> 
>>  1. I think a new interface would be good as FileContext could do the
>>  same thing
>>  2. using PathCapabilities probes should still be mandatory as for
>>  FileContext it would depend on the back end
>>  3. Whoever does this gets to specify what the API does and write the
>>  contract tests. Saying "just to do what HDFS does" isn't enough as it's
>> not
>>  always clear the HDFS team no how much of that behaviour is intentional
>>  (rename, anyone?).
>> 
>> 
>> For any new API (a better rename, a better delete,...) I would normally
>> insist on making it cloud friendly, with an extensible builder API and an
>> emphasis on asynchronous IO. However this is existing code and does target
>> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
>> right thing to do here.
>> 
>> I have a WiP project to do a shim library to offer new FS APIs two older
>> Hadoop releases by way of reflection, so that we can get new APIs taken up
>> across projects where we cannot choreograph version updates across the
>> entire stack. (hello parquet, spark,...). My goal is to actually make this
>> a Hadoop managed project, with its own release schedule. You could add an
>> equivalent of the new interface in here, which would then use reflection
>> behind-the-scenes to invoke the underlying HDFS methods when the FS client
>> has them.
>> 
>> https://github.com/steveloughran/fs-api-shim
>> 
>> I've just added vector IO API there; the next step is to copy over a lot of
>> the contract tests from hadoop common and apply them through the shim -to
>> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
>> tricky as the reflection itself. However without this library it is going
>> to take a long long time for the open source applications to pick up the
>> higher performance/Cloud ready Apis. Yes, those of us who can build the
>> entire stack can do it, but that gradually adds more divergence from the
>> open source libraries, reduces the test coverage overall and only increases
>> maintenance costs over time.
>> 
>> steve
>> 
>>> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> Stephen and I are working on a project to make HBase to run on Ozone.
>>> 
>>> HBase, born out of the Hadoop project, depends on a number of HDFS
>> specific
>>> APIs, including recoverLease() and isInSafeMode(). The HBase community
>> [1]
>>> strongly voiced that they don't want the project to have direct
>> dependency
>>> on additional FS implementations due to dependency and vulnerability
>>> management concerns.
>>> 
>>> To make this project successful, we're exploring options, to push up
>> these
>>> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
>>> implementation agnostic, and perhaps enable HBase to support other
>> storage
>>> systems in the future.
>>> 
>>> We'd use the PathCapabilities API to probe if the underlying FS
>>> implementation supports these APIs, and would then invoke the
>> corresponding
>>> FileSystem APIs. This is straightforward but the FileSystem would become
>>> bloated.
>>> 
>>> Another option is to create a "RecoverableFileSystem" interface, and have
>>> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
>>> way the impact to the Hadoop project and the FileSystem abstraction is
>> even
>>> smaller.
>>> 
>>> Thoughts?
>>> 
>>> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>>> 
>> 
  

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase having resistance against the people proposing or against Ozone’

Anyway doesn’t ViewDistributedFileSystem not solve this Ozone problem, I remember Uma chasing that to solve these problems only?

Pulling up the core HDFS API honestly looks a naive approach, there is some work around reflection for DistCp with snapahots to work with Ozone, Hbase folks could have used that as well(https://issues.apache.org/jira/browse/HDFS-16911)

Juzz my thoughts on solving the problem, which I feel can be easily solved by writing a Util class in Hbase with some reflection logics…


-Ayush

> On 20-Mar-2023, at 9:54 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> 
> Thank you. Makes sense to me. Yes, as part of this effort we are going to
> need contract tests.
> 
>> On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <st...@cloudera.com.invalid>
>> wrote:
>> 
>>   1. I think a new interface would be good as FileContext could do the
>>   same thing
>>   2. using PathCapabilities probes should still be mandatory as for
>>   FileContext it would depend on the back end
>>   3. Whoever does this gets to specify what the API does and write the
>>   contract tests. Saying "just to do what HDFS does" isn't enough as it's
>> not
>>   always clear the HDFS team no how much of that behaviour is intentional
>>   (rename, anyone?).
>> 
>> 
>> For any new API (a better rename, a better delete,...) I would normally
>> insist on making it cloud friendly, with an extensible builder API and an
>> emphasis on asynchronous IO. However this is existing code and does target
>> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
>> right thing to do here.
>> 
>> I have a WiP project to do a shim library to offer new FS APIs two older
>> Hadoop releases by way of reflection, so that we can get new APIs taken up
>> across projects where we cannot choreograph version updates across the
>> entire stack. (hello parquet, spark,...). My goal is to actually make this
>> a Hadoop managed project, with its own release schedule. You could add an
>> equivalent of the new interface in here, which would then use reflection
>> behind-the-scenes to invoke the underlying HDFS methods when the FS client
>> has them.
>> 
>> https://github.com/steveloughran/fs-api-shim
>> 
>> I've just added vector IO API there; the next step is to copy over a lot of
>> the contract tests from hadoop common and apply them through the shim -to
>> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
>> tricky as the reflection itself. However without this library it is going
>> to take a long long time for the open source applications to pick up the
>> higher performance/Cloud ready Apis. Yes, those of us who can build the
>> entire stack can do it, but that gradually adds more divergence from the
>> open source libraries, reduces the test coverage overall and only increases
>> maintenance costs over time.
>> 
>> steve
>> 
>>> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> Stephen and I are working on a project to make HBase to run on Ozone.
>>> 
>>> HBase, born out of the Hadoop project, depends on a number of HDFS
>> specific
>>> APIs, including recoverLease() and isInSafeMode(). The HBase community
>> [1]
>>> strongly voiced that they don't want the project to have direct
>> dependency
>>> on additional FS implementations due to dependency and vulnerability
>>> management concerns.
>>> 
>>> To make this project successful, we're exploring options, to push up
>> these
>>> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
>>> implementation agnostic, and perhaps enable HBase to support other
>> storage
>>> systems in the future.
>>> 
>>> We'd use the PathCapabilities API to probe if the underlying FS
>>> implementation supports these APIs, and would then invoke the
>> corresponding
>>> FileSystem APIs. This is straightforward but the FileSystem would become
>>> bloated.
>>> 
>>> Another option is to create a "RecoverableFileSystem" interface, and have
>>> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
>>> way the impact to the Hadoop project and the FileSystem abstraction is
>> even
>>> smaller.
>>> 
>>> Thoughts?
>>> 
>>> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>>> 
>> 

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase having resistance against the people proposing or against Ozone’

Anyway doesn’t ViewDistributedFileSystem not solve this Ozone problem, I remember Uma chasing that to solve these problems only?

Pulling up the core HDFS API honestly looks a naive approach, there is some work around reflection for DistCp with snapahots to work with Ozone, Hbase folks could have used that as well(https://issues.apache.org/jira/browse/HDFS-16911)

Juzz my thoughts on solving the problem, which I feel can be easily solved by writing a Util class in Hbase with some reflection logics…


-Ayush

> On 20-Mar-2023, at 9:54 PM, Wei-Chiu Chuang <we...@apache.org> wrote:
> 
> Thank you. Makes sense to me. Yes, as part of this effort we are going to
> need contract tests.
> 
>> On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <st...@cloudera.com.invalid>
>> wrote:
>> 
>>   1. I think a new interface would be good as FileContext could do the
>>   same thing
>>   2. using PathCapabilities probes should still be mandatory as for
>>   FileContext it would depend on the back end
>>   3. Whoever does this gets to specify what the API does and write the
>>   contract tests. Saying "just to do what HDFS does" isn't enough as it's
>> not
>>   always clear the HDFS team no how much of that behaviour is intentional
>>   (rename, anyone?).
>> 
>> 
>> For any new API (a better rename, a better delete,...) I would normally
>> insist on making it cloud friendly, with an extensible builder API and an
>> emphasis on asynchronous IO. However this is existing code and does target
>> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
>> right thing to do here.
>> 
>> I have a WiP project to do a shim library to offer new FS APIs two older
>> Hadoop releases by way of reflection, so that we can get new APIs taken up
>> across projects where we cannot choreograph version updates across the
>> entire stack. (hello parquet, spark,...). My goal is to actually make this
>> a Hadoop managed project, with its own release schedule. You could add an
>> equivalent of the new interface in here, which would then use reflection
>> behind-the-scenes to invoke the underlying HDFS methods when the FS client
>> has them.
>> 
>> https://github.com/steveloughran/fs-api-shim
>> 
>> I've just added vector IO API there; the next step is to copy over a lot of
>> the contract tests from hadoop common and apply them through the shim -to
>> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
>> tricky as the reflection itself. However without this library it is going
>> to take a long long time for the open source applications to pick up the
>> higher performance/Cloud ready Apis. Yes, those of us who can build the
>> entire stack can do it, but that gradually adds more divergence from the
>> open source libraries, reduces the test coverage overall and only increases
>> maintenance costs over time.
>> 
>> steve
>> 
>>> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> Stephen and I are working on a project to make HBase to run on Ozone.
>>> 
>>> HBase, born out of the Hadoop project, depends on a number of HDFS
>> specific
>>> APIs, including recoverLease() and isInSafeMode(). The HBase community
>> [1]
>>> strongly voiced that they don't want the project to have direct
>> dependency
>>> on additional FS implementations due to dependency and vulnerability
>>> management concerns.
>>> 
>>> To make this project successful, we're exploring options, to push up
>> these
>>> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
>>> implementation agnostic, and perhaps enable HBase to support other
>> storage
>>> systems in the future.
>>> 
>>> We'd use the PathCapabilities API to probe if the underlying FS
>>> implementation supports these APIs, and would then invoke the
>> corresponding
>>> FileSystem APIs. This is straightforward but the FileSystem would become
>>> bloated.
>>> 
>>> Another option is to create a "RecoverableFileSystem" interface, and have
>>> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
>>> way the impact to the Hadoop project and the FileSystem abstraction is
>> even
>>> smaller.
>>> 
>>> Thoughts?
>>> 
>>> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>>> 
>> 

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Wei-Chiu Chuang <we...@apache.org>.
Thank you. Makes sense to me. Yes, as part of this effort we are going to
need contract tests.

On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

>    1. I think a new interface would be good as FileContext could do the
>    same thing
>    2. using PathCapabilities probes should still be mandatory as for
>    FileContext it would depend on the back end
>    3. Whoever does this gets to specify what the API does and write the
>    contract tests. Saying "just to do what HDFS does" isn't enough as it's
> not
>    always clear the HDFS team no how much of that behaviour is intentional
>    (rename, anyone?).
>
>
> For any new API (a better rename, a better delete,...) I would normally
> insist on making it cloud friendly, with an extensible builder API and an
> emphasis on asynchronous IO. However this is existing code and does target
> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
> right thing to do here.
>
>  I have a WiP project to do a shim library to offer new FS APIs two older
> Hadoop releases by way of reflection, so that we can get new APIs taken up
> across projects where we cannot choreograph version updates across the
> entire stack. (hello parquet, spark,...). My goal is to actually make this
> a Hadoop managed project, with its own release schedule. You could add an
> equivalent of the new interface in here, which would then use reflection
> behind-the-scenes to invoke the underlying HDFS methods when the FS client
> has them.
>
> https://github.com/steveloughran/fs-api-shim
>
> I've just added vector IO API there; the next step is to copy over a lot of
> the contract tests from hadoop common and apply them through the shim -to
> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
> tricky as the reflection itself. However without this library it is going
> to take a long long time for the open source applications to pick up the
> higher performance/Cloud ready Apis. Yes, those of us who can build the
> entire stack can do it, but that gradually adds more divergence from the
> open source libraries, reduces the test coverage overall and only increases
> maintenance costs over time.
>
> steve
>
> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Wei-Chiu Chuang <we...@apache.org>.
Thank you. Makes sense to me. Yes, as part of this effort we are going to
need contract tests.

On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

>    1. I think a new interface would be good as FileContext could do the
>    same thing
>    2. using PathCapabilities probes should still be mandatory as for
>    FileContext it would depend on the back end
>    3. Whoever does this gets to specify what the API does and write the
>    contract tests. Saying "just to do what HDFS does" isn't enough as it's
> not
>    always clear the HDFS team no how much of that behaviour is intentional
>    (rename, anyone?).
>
>
> For any new API (a better rename, a better delete,...) I would normally
> insist on making it cloud friendly, with an extensible builder API and an
> emphasis on asynchronous IO. However this is existing code and does target
> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
> right thing to do here.
>
>  I have a WiP project to do a shim library to offer new FS APIs two older
> Hadoop releases by way of reflection, so that we can get new APIs taken up
> across projects where we cannot choreograph version updates across the
> entire stack. (hello parquet, spark,...). My goal is to actually make this
> a Hadoop managed project, with its own release schedule. You could add an
> equivalent of the new interface in here, which would then use reflection
> behind-the-scenes to invoke the underlying HDFS methods when the FS client
> has them.
>
> https://github.com/steveloughran/fs-api-shim
>
> I've just added vector IO API there; the next step is to copy over a lot of
> the contract tests from hadoop common and apply them through the shim -to
> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
> tricky as the reflection itself. However without this library it is going
> to take a long long time for the open source applications to pick up the
> higher performance/Cloud ready Apis. Yes, those of us who can build the
> entire stack can do it, but that gradually adds more divergence from the
> open source libraries, reduces the test coverage overall and only increases
> maintenance costs over time.
>
> steve
>
> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
   1. I think a new interface would be good as FileContext could do the
   same thing
   2. using PathCapabilities probes should still be mandatory as for
   FileContext it would depend on the back end
   3. Whoever does this gets to specify what the API does and write the
   contract tests. Saying "just to do what HDFS does" isn't enough as it's not
   always clear the HDFS team no how much of that behaviour is intentional
   (rename, anyone?).


For any new API (a better rename, a better delete,...) I would normally
insist on making it cloud friendly, with an extensible builder API and an
emphasis on asynchronous IO. However this is existing code and does target
HDFS and Ozone -pulling the existing APIs up into a new interface seems the
right thing to do here.

 I have a WiP project to do a shim library to offer new FS APIs two older
Hadoop releases by way of reflection, so that we can get new APIs taken up
across projects where we cannot choreograph version updates across the
entire stack. (hello parquet, spark,...). My goal is to actually make this
a Hadoop managed project, with its own release schedule. You could add an
equivalent of the new interface in here, which would then use reflection
behind-the-scenes to invoke the underlying HDFS methods when the FS client
has them.

https://github.com/steveloughran/fs-api-shim

I've just added vector IO API there; the next step is to copy over a lot of
the contract tests from hadoop common and apply them through the shim -to
hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
tricky as the reflection itself. However without this library it is going
to take a long long time for the open source applications to pick up the
higher performance/Cloud ready Apis. Yes, those of us who can build the
entire stack can do it, but that gradually adds more divergence from the
open source libraries, reduces the test coverage overall and only increases
maintenance costs over time.

steve

On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi,
>
> Stephen and I are working on a project to make HBase to run on Ozone.
>
> HBase, born out of the Hadoop project, depends on a number of HDFS specific
> APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
> strongly voiced that they don't want the project to have direct dependency
> on additional FS implementations due to dependency and vulnerability
> management concerns.
>
> To make this project successful, we're exploring options, to push up these
> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> implementation agnostic, and perhaps enable HBase to support other storage
> systems in the future.
>
> We'd use the PathCapabilities API to probe if the underlying FS
> implementation supports these APIs, and would then invoke the corresponding
> FileSystem APIs. This is straightforward but the FileSystem would become
> bloated.
>
> Another option is to create a "RecoverableFileSystem" interface, and have
> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> way the impact to the Hadoop project and the FileSystem abstraction is even
> smaller.
>
> Thoughts?
>
> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
I do have a WiP library to hide that hoop-jumping behind a normal API, with
a goal of 3.2+ support only. It does actually compile against hadoop 2, but
it isn't tested there

https://github.com/steveloughran/fs-api-shim

1. my goal is to make it a hadoop library like hadoop-third-party, with its
own release cycle etc
2. testing is complex as it needs to have contract tests (lifted from
hadoop-common) applied through all versions of hadoop

so far openFile(), ByteBufferPositionedReadable, PathCapabilities *and*
VectorIO are in, though the latter has no tests yet.

If we can get this to work then (a) HBase can delegate suffering and (b)
libraries like parquet & ORC can use vector IO while still compiling
against older versions

On Tue, 28 Mar 2023 at 19:02, Nick Dimiduk <nd...@apache.org> wrote:

> On Mon, Mar 27, 2023 at 20:29 Wei-Chiu Chuang <we...@apache.org> wrote:
>
> > For complex applications such as
> > HBase it is almost impossible to achieve true FS agnosticity without
> proper
> > contract tests, as now I am starting to realize.
> >
>
> This is absolutely true. HBase jumps through all sorts of painful
> reflective hoops to achieve reliable behavior across Hadoop versions.
> Steve’s proposal of self-describing APIs over opaque implementations would
> be drastically better than our current approach of reflection for
> inspecting and interacting with the internal details of any given client
> implementation.
>
> Thank you very much for your studious pursuit of this goal.
>
> Thanks,
> Nick
>
> On Mon, Mar 27, 2023 at 4:58 AM Steve Loughran <stevel@cloudera.com.invalid
> >
> > wrote:
> >
> > > side issue, as i think about what bulk delete call would also keep
> hbase
> > > happy
> > > https://issues.apache.org/jira/browse/HADOOP-18679
> > >
> > > should we think about new API calls only raising RuntimeExceptions?
> > >
> > > The more work I do on futures the more the way we always raise IOEs
> > > complicates life. java has outgrown checked exceptions
> > >
> > > On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com>
> > wrote:
> > >
> > > >
> > > >
> > > > On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com>
> wrote:
> > > >
> > > >>
> > > >> Second idea mentioned in the original mail is also similar to
> > mentioned
> > > in
> > > >> the comment in the above ticket and is still quite acceptable, name
> > can
> > > be
> > > >> negotiated though, Add an interface to pull the relevant methods up
> in
> > > >> that
> > > >> without touching FileSystem class, we can have DFS implement that
> and
> > > >> Ozone
> > > >> FS implement them as well. We should be sorted: No Hacking, No
> > Bothering
> > > >> FileSystem and still things can work
> > > >>
> > > >>
> > > >>
> > > > This is the way we should be thinking about it. an interface which
> > > > filesystems MAY implement, but many do not.
> > > >
> > > > this has happened with some of the recent apis.
> > > >
> > > > presence of the API doesn't guarantee the api is active, only that it
> > may
> > > > be possible to call...callers should use PathCapabilities api to see
> if
> > > it
> > > > is live
> > > >
> > > >
> > > >>
> > >
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
I do have a WiP library to hide that hoop-jumping behind a normal API, with
a goal of 3.2+ support only. It does actually compile against hadoop 2, but
it isn't tested there

https://github.com/steveloughran/fs-api-shim

1. my goal is to make it a hadoop library like hadoop-third-party, with its
own release cycle etc
2. testing is complex as it needs to have contract tests (lifted from
hadoop-common) applied through all versions of hadoop

so far openFile(), ByteBufferPositionedReadable, PathCapabilities *and*
VectorIO are in, though the latter has no tests yet.

If we can get this to work then (a) HBase can delegate suffering and (b)
libraries like parquet & ORC can use vector IO while still compiling
against older versions

On Tue, 28 Mar 2023 at 19:02, Nick Dimiduk <nd...@apache.org> wrote:

> On Mon, Mar 27, 2023 at 20:29 Wei-Chiu Chuang <we...@apache.org> wrote:
>
> > For complex applications such as
> > HBase it is almost impossible to achieve true FS agnosticity without
> proper
> > contract tests, as now I am starting to realize.
> >
>
> This is absolutely true. HBase jumps through all sorts of painful
> reflective hoops to achieve reliable behavior across Hadoop versions.
> Steve’s proposal of self-describing APIs over opaque implementations would
> be drastically better than our current approach of reflection for
> inspecting and interacting with the internal details of any given client
> implementation.
>
> Thank you very much for your studious pursuit of this goal.
>
> Thanks,
> Nick
>
> On Mon, Mar 27, 2023 at 4:58 AM Steve Loughran <stevel@cloudera.com.invalid
> >
> > wrote:
> >
> > > side issue, as i think about what bulk delete call would also keep
> hbase
> > > happy
> > > https://issues.apache.org/jira/browse/HADOOP-18679
> > >
> > > should we think about new API calls only raising RuntimeExceptions?
> > >
> > > The more work I do on futures the more the way we always raise IOEs
> > > complicates life. java has outgrown checked exceptions
> > >
> > > On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com>
> > wrote:
> > >
> > > >
> > > >
> > > > On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com>
> wrote:
> > > >
> > > >>
> > > >> Second idea mentioned in the original mail is also similar to
> > mentioned
> > > in
> > > >> the comment in the above ticket and is still quite acceptable, name
> > can
> > > be
> > > >> negotiated though, Add an interface to pull the relevant methods up
> in
> > > >> that
> > > >> without touching FileSystem class, we can have DFS implement that
> and
> > > >> Ozone
> > > >> FS implement them as well. We should be sorted: No Hacking, No
> > Bothering
> > > >> FileSystem and still things can work
> > > >>
> > > >>
> > > >>
> > > > This is the way we should be thinking about it. an interface which
> > > > filesystems MAY implement, but many do not.
> > > >
> > > > this has happened with some of the recent apis.
> > > >
> > > > presence of the API doesn't guarantee the api is active, only that it
> > may
> > > > be possible to call...callers should use PathCapabilities api to see
> if
> > > it
> > > > is live
> > > >
> > > >
> > > >>
> > >
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Nick Dimiduk <nd...@apache.org>.
On Mon, Mar 27, 2023 at 20:29 Wei-Chiu Chuang <we...@apache.org> wrote:

> For complex applications such as
> HBase it is almost impossible to achieve true FS agnosticity without proper
> contract tests, as now I am starting to realize.
>

This is absolutely true. HBase jumps through all sorts of painful
reflective hoops to achieve reliable behavior across Hadoop versions.
Steve’s proposal of self-describing APIs over opaque implementations would
be drastically better than our current approach of reflection for
inspecting and interacting with the internal details of any given client
implementation.

Thank you very much for your studious pursuit of this goal.

Thanks,
Nick

On Mon, Mar 27, 2023 at 4:58 AM Steve Loughran <st...@cloudera.com.invalid>
> wrote:
>
> > side issue, as i think about what bulk delete call would also keep hbase
> > happy
> > https://issues.apache.org/jira/browse/HADOOP-18679
> >
> > should we think about new API calls only raising RuntimeExceptions?
> >
> > The more work I do on futures the more the way we always raise IOEs
> > complicates life. java has outgrown checked exceptions
> >
> > On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com>
> wrote:
> >
> > >
> > >
> > > On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:
> > >
> > >>
> > >> Second idea mentioned in the original mail is also similar to
> mentioned
> > in
> > >> the comment in the above ticket and is still quite acceptable, name
> can
> > be
> > >> negotiated though, Add an interface to pull the relevant methods up in
> > >> that
> > >> without touching FileSystem class, we can have DFS implement that and
> > >> Ozone
> > >> FS implement them as well. We should be sorted: No Hacking, No
> Bothering
> > >> FileSystem and still things can work
> > >>
> > >>
> > >>
> > > This is the way we should be thinking about it. an interface which
> > > filesystems MAY implement, but many do not.
> > >
> > > this has happened with some of the recent apis.
> > >
> > > presence of the API doesn't guarantee the api is active, only that it
> may
> > > be possible to call...callers should use PathCapabilities api to see if
> > it
> > > is live
> > >
> > >
> > >>
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Nick Dimiduk <nd...@apache.org>.
On Mon, Mar 27, 2023 at 20:29 Wei-Chiu Chuang <we...@apache.org> wrote:

> For complex applications such as
> HBase it is almost impossible to achieve true FS agnosticity without proper
> contract tests, as now I am starting to realize.
>

This is absolutely true. HBase jumps through all sorts of painful
reflective hoops to achieve reliable behavior across Hadoop versions.
Steve’s proposal of self-describing APIs over opaque implementations would
be drastically better than our current approach of reflection for
inspecting and interacting with the internal details of any given client
implementation.

Thank you very much for your studious pursuit of this goal.

Thanks,
Nick

On Mon, Mar 27, 2023 at 4:58 AM Steve Loughran <st...@cloudera.com.invalid>
> wrote:
>
> > side issue, as i think about what bulk delete call would also keep hbase
> > happy
> > https://issues.apache.org/jira/browse/HADOOP-18679
> >
> > should we think about new API calls only raising RuntimeExceptions?
> >
> > The more work I do on futures the more the way we always raise IOEs
> > complicates life. java has outgrown checked exceptions
> >
> > On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com>
> wrote:
> >
> > >
> > >
> > > On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:
> > >
> > >>
> > >> Second idea mentioned in the original mail is also similar to
> mentioned
> > in
> > >> the comment in the above ticket and is still quite acceptable, name
> can
> > be
> > >> negotiated though, Add an interface to pull the relevant methods up in
> > >> that
> > >> without touching FileSystem class, we can have DFS implement that and
> > >> Ozone
> > >> FS implement them as well. We should be sorted: No Hacking, No
> Bothering
> > >> FileSystem and still things can work
> > >>
> > >>
> > >>
> > > This is the way we should be thinking about it. an interface which
> > > filesystems MAY implement, but many do not.
> > >
> > > this has happened with some of the recent apis.
> > >
> > > presence of the API doesn't guarantee the api is active, only that it
> may
> > > be possible to call...callers should use PathCapabilities api to see if
> > it
> > > is live
> > >
> > >
> > >>
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Wei-Chiu Chuang <we...@apache.org>.
I think moving up interfaces to FileSystem or some abstract FileSystem
class has a few benefits:

1. Application can potentially be made FS-agnostic, with
hasPathCapabilities() check.
At least, make the code to compile.

2. We will be able to add a contract test to ensure behavior is expected.
The second one is more critical than (1). For complex applications such as
HBase it is almost impossible to achieve true FS agnosticity without proper
contract tests, as now I am starting to realize.

This is where I am coming from. No need to make Hadoop application
development harder than it already is.

On Mon, Mar 27, 2023 at 4:58 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

> side issue, as i think about what bulk delete call would also keep hbase
> happy
> https://issues.apache.org/jira/browse/HADOOP-18679
>
> should we think about new API calls only raising RuntimeExceptions?
>
> The more work I do on futures the more the way we always raise IOEs
> complicates life. java has outgrown checked exceptions
>
> On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com> wrote:
>
> >
> >
> > On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:
> >
> >>
> >> Second idea mentioned in the original mail is also similar to mentioned
> in
> >> the comment in the above ticket and is still quite acceptable, name can
> be
> >> negotiated though, Add an interface to pull the relevant methods up in
> >> that
> >> without touching FileSystem class, we can have DFS implement that and
> >> Ozone
> >> FS implement them as well. We should be sorted: No Hacking, No Bothering
> >> FileSystem and still things can work
> >>
> >>
> >>
> > This is the way we should be thinking about it. an interface which
> > filesystems MAY implement, but many do not.
> >
> > this has happened with some of the recent apis.
> >
> > presence of the API doesn't guarantee the api is active, only that it may
> > be possible to call...callers should use PathCapabilities api to see if
> it
> > is live
> >
> >
> >>
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Wei-Chiu Chuang <we...@apache.org>.
I think moving up interfaces to FileSystem or some abstract FileSystem
class has a few benefits:

1. Application can potentially be made FS-agnostic, with
hasPathCapabilities() check.
At least, make the code to compile.

2. We will be able to add a contract test to ensure behavior is expected.
The second one is more critical than (1). For complex applications such as
HBase it is almost impossible to achieve true FS agnosticity without proper
contract tests, as now I am starting to realize.

This is where I am coming from. No need to make Hadoop application
development harder than it already is.

On Mon, Mar 27, 2023 at 4:58 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

> side issue, as i think about what bulk delete call would also keep hbase
> happy
> https://issues.apache.org/jira/browse/HADOOP-18679
>
> should we think about new API calls only raising RuntimeExceptions?
>
> The more work I do on futures the more the way we always raise IOEs
> complicates life. java has outgrown checked exceptions
>
> On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com> wrote:
>
> >
> >
> > On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:
> >
> >>
> >> Second idea mentioned in the original mail is also similar to mentioned
> in
> >> the comment in the above ticket and is still quite acceptable, name can
> be
> >> negotiated though, Add an interface to pull the relevant methods up in
> >> that
> >> without touching FileSystem class, we can have DFS implement that and
> >> Ozone
> >> FS implement them as well. We should be sorted: No Hacking, No Bothering
> >> FileSystem and still things can work
> >>
> >>
> >>
> > This is the way we should be thinking about it. an interface which
> > filesystems MAY implement, but many do not.
> >
> > this has happened with some of the recent apis.
> >
> > presence of the API doesn't guarantee the api is active, only that it may
> > be possible to call...callers should use PathCapabilities api to see if
> it
> > is live
> >
> >
> >>
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
side issue, as i think about what bulk delete call would also keep hbase
happy
https://issues.apache.org/jira/browse/HADOOP-18679

should we think about new API calls only raising RuntimeExceptions?

The more work I do on futures the more the way we always raise IOEs
complicates life. java has outgrown checked exceptions

On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com> wrote:

>
>
> On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:
>
>>
>> Second idea mentioned in the original mail is also similar to mentioned in
>> the comment in the above ticket and is still quite acceptable, name can be
>> negotiated though, Add an interface to pull the relevant methods up in
>> that
>> without touching FileSystem class, we can have DFS implement that and
>> Ozone
>> FS implement them as well. We should be sorted: No Hacking, No Bothering
>> FileSystem and still things can work
>>
>>
>>
> This is the way we should be thinking about it. an interface which
> filesystems MAY implement, but many do not.
>
> this has happened with some of the recent apis.
>
> presence of the API doesn't guarantee the api is active, only that it may
> be possible to call...callers should use PathCapabilities api to see if it
> is live
>
>
>>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
side issue, as i think about what bulk delete call would also keep hbase
happy
https://issues.apache.org/jira/browse/HADOOP-18679

should we think about new API calls only raising RuntimeExceptions?

The more work I do on futures the more the way we always raise IOEs
complicates life. java has outgrown checked exceptions

On Fri, 24 Mar 2023 at 09:44, Steve Loughran <st...@cloudera.com> wrote:

>
>
> On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:
>
>>
>> Second idea mentioned in the original mail is also similar to mentioned in
>> the comment in the above ticket and is still quite acceptable, name can be
>> negotiated though, Add an interface to pull the relevant methods up in
>> that
>> without touching FileSystem class, we can have DFS implement that and
>> Ozone
>> FS implement them as well. We should be sorted: No Hacking, No Bothering
>> FileSystem and still things can work
>>
>>
>>
> This is the way we should be thinking about it. an interface which
> filesystems MAY implement, but many do not.
>
> this has happened with some of the recent apis.
>
> presence of the API doesn't guarantee the api is active, only that it may
> be possible to call...callers should use PathCapabilities api to see if it
> is live
>
>
>>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:

>
> Second idea mentioned in the original mail is also similar to mentioned in
> the comment in the above ticket and is still quite acceptable, name can be
> negotiated though, Add an interface to pull the relevant methods up in that
> without touching FileSystem class, we can have DFS implement that and Ozone
> FS implement them as well. We should be sorted: No Hacking, No Bothering
> FileSystem and still things can work
>
>
>
This is the way we should be thinking about it. an interface which
filesystems MAY implement, but many do not.

this has happened with some of the recent apis.

presence of the API doesn't guarantee the api is active, only that it may
be possible to call...callers should use PathCapabilities api to see if it
is live


>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
On Thu, 23 Mar 2023 at 10:07, Ayush Saxena <ay...@gmail.com> wrote:

>
> Second idea mentioned in the original mail is also similar to mentioned in
> the comment in the above ticket and is still quite acceptable, name can be
> negotiated though, Add an interface to pull the relevant methods up in that
> without touching FileSystem class, we can have DFS implement that and Ozone
> FS implement them as well. We should be sorted: No Hacking, No Bothering
> FileSystem and still things can work
>
>
>
This is the way we should be thinking about it. an interface which
filesystems MAY implement, but many do not.

this has happened with some of the recent apis.

presence of the API doesn't guarantee the api is active, only that it may
be possible to call...callers should use PathCapabilities api to see if it
is live


>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
They both need it for a similar use case: "to support Ozone", not anything
core that we handle as part of "Apache Hadoop" and I suppose both are
working fine with HDFS, because of adding dependency with HDFS? and now
they don't want to add Ozone for whatever reasons and folks chasing this
integration want to pass those issues or workload(maintaining/releasing) to
Hadoop.

Adding *isReady()* does make sense to me, I haven't checked what it is
gonna do or work for all FileSystems, but it sounds fair enough to me. Feel
free to raise a ticket for that if you have done some work already

Adding those isSafemode, recoverLease to FileSystem still doesn't make
sense to me, considering them not useful for a bunch of other implementing
FS and we don't want them to lie around just like that and put that
Technical Debt for all. Doing this sounds to me like a hack to avoid adding
Ozone as dependency to these client projects, where they want to use Ozone.
Anyway not starting this again 😉

Quoting Wei-Chiu itself from the first mail, he agreed that adding those
aren't a good idea:

> This is straightforward but the FileSystem would become
> bloated.
>

It was an agreed and known fact, now if it has changed, then can't help it.
A quick HADOOP-18671 <https://issues.apache.org/jira/browse/HADOOP-18671> which
got raised by Wei-Chiu to add these to "FileSystem" has a comment as well
to get them around interface, that still is "kind of Ok", If done properly,
in addition to what already mentioned there in the comments, Without
introducing any "incompatibilities" and not just test but "stable" tests.
Not like some of the recent few changes where folks come and say "we can't
do it without breaking anything", and other people trying to fix the mess,
not getting off topic with that here....

Second idea mentioned in the original mail is also similar to mentioned in
the comment in the above ticket and is still quite acceptable, name can be
negotiated though, Add an interface to pull the relevant methods up in that
without touching FileSystem class, we can have DFS implement that and Ozone
FS implement them as well. We should be sorted: No Hacking, No Bothering
FileSystem and still things can work

-Ayush




On Thu, 23 Mar 2023 at 14:01, Tsz Wo Sze <sz...@yahoo.com> wrote:

> (Clicked "send" too accidentally.  Please ignore my previous email.
> Sorry.)
>
> Hi,
>
>
> We probably should exclude HBase in this discuss.  I guess Wei-Chiu
> mentioning it as an example use case.  There are other projects such as
> Apache Solr requiring similar features.
>
>
> (1) We already has the Syncable (hsync/hflush) interface in Hadoop, it
> makes sense to have a recover() method for recovering hsync'ed/hflushed
> files.  Otherwise, the Syncable feature is incomplete.
>
>
> (2) I also suggest to add a isReady() method.  In FileSystem, there is an
> initialize(..) and its javadoc says:
>
>    * Called after the new FileSystem instance is constructed, and before it
>    * is ready for use.
>
> However, there is no way to check if the FileSystem instance is ready.
>
>
> These are currently two missing features in Hadoop FileSystem.
>
>
> Tsz-Wo
>
>
>
>
> On Wednesday, March 22, 2023, 12:30:49 PM GMT+8, Ayush Saxena <
> ayushtkn@gmail.com> wrote:
>
>
> Well reflections are good or not will drag this somewhere else. I will
> respect what Tsz-Wo said and put this in my rule book for future :)
>
> If I get into Why we don’t have “all” the API in FileSystem itself will
> drag it to another area, What and where to use Abstraction and stuff like
> that, Which none of the people over here would be interested in.
>
> On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
> suggested as Tsz-Wo considers it as a hack.
>
> I have strict objections on pulling them up to FileSystem class because
> they are very core to HDFS and the mentioned API are not just the ones,
> ErasureCoding? And Tomorrow we would have similar requests for ABFS only
> API, Huawei OBS only and many more. FileSystem class would become huge and
> the Technical Debt that we would bring in or encourage would be really
> high. Who is gonna chase behind people if these code creates issues
> somewhere else? I don’t want to quote example publicly and proove a point
> or so. So, leaving this here. With my conclusion on this solution.
>
> Technically it is a HBase problem, they should adapt to Ozone, not sure why
> are they creating unnecessary sound. I got into similar situation for Ozone
> in a different downstream project and folks were very encouraging to get
> and add support for Ozone, Guava messed up else it would have been in. Now
> with this approach those guys won’t even agree. “Go handle at Hadoop or
> Ozone, Why us”, this is something neither of the projects want….
>
> There were mention of favoured node and all as well in the Hbase ML, after
> this would be these stuff, IIRC. The proposal for having option for
> favoured node in Distcp was vetoed recently considering it HDFS only (not
> by me), so thats never ending….
>
> We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
> 🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
> think now people don’t consider it as a solution and we will keep on doing
> stuff for Hbase and then other folks will keep on managing, maintaining and
> releasing them forever!!!
>
> Good Luck!!! But 2 possible solutions are down in this thread
>
> -Ayush
>
> On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <sz...@yahoo.com> wrote:
>
> >
> > Ayush,
> >
> >
> > Yes, reflections are a part of Java.  Why we have to define the
> > FileSystem APIs but not simply use reflections all the times?
> >
> >
> > Reflection is good for dealing with unknown code such as loading a
> plugin,
> > code analysis, etc.  However, it probably is not a good way to define
> APIs.
> >
> >
> >
> > Tsz-Wo
> >
> >
> > On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> > ayushtkn@gmail.com> wrote:
> >
> >
> > I am not sure what classifies as a Hack and what not, I thought
> reflections
> > are part of Java.
> >
> > Whatever solution but pulling in just the HDFS specific stuff to
> FileSystem
> > just for Ozone, because Hbase guys didn’t agree and we have people in
> > Hadoop who we can convince, I am -1 to such an approach and mindset.
> Hbase
> > wants ozone, they should give way for it like they do for HDFS
> >
> > Explore ways in Hbase, explore the Utils and ways by the links that Steve
> > shared, try ViewDFS, When we have some more convincing reasons, we can
> > discuss more over here to pull them to FileSystem as the last option
> >
> > -Ayush
> >
> > On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Stephen and I are working on a project to make HBase to run on Ozone.
> > >
> > > HBase, born out of the Hadoop project, depends on a number of HDFS
> > specific
> > > APIs, including recoverLease() and isInSafeMode(). The HBase community
> > [1]
> > > strongly voiced that they don't want the project to have direct
> > dependency
> > > on additional FS implementations due to dependency and vulnerability
> > > management concerns.
> > >
> > > To make this project successful, we're exploring options, to push up
> > these
> > > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > > implementation agnostic, and perhaps enable HBase to support other
> > storage
> > > systems in the future.
> > >
> > > We'd use the PathCapabilities API to probe if the underlying FS
> > > implementation supports these APIs, and would then invoke the
> > corresponding
> > > FileSystem APIs. This is straightforward but the FileSystem would
> become
> > > bloated.
> > >
> > > Another option is to create a "RecoverableFileSystem" interface, and
> have
> > > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone).
> This
> > > way the impact to the Hadoop project and the FileSystem abstraction is
> > even
> > > smaller.
> > >
> > > Thoughts?
> > >
> > > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> > >
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
They both need it for a similar use case: "to support Ozone", not anything
core that we handle as part of "Apache Hadoop" and I suppose both are
working fine with HDFS, because of adding dependency with HDFS? and now
they don't want to add Ozone for whatever reasons and folks chasing this
integration want to pass those issues or workload(maintaining/releasing) to
Hadoop.

Adding *isReady()* does make sense to me, I haven't checked what it is
gonna do or work for all FileSystems, but it sounds fair enough to me. Feel
free to raise a ticket for that if you have done some work already

Adding those isSafemode, recoverLease to FileSystem still doesn't make
sense to me, considering them not useful for a bunch of other implementing
FS and we don't want them to lie around just like that and put that
Technical Debt for all. Doing this sounds to me like a hack to avoid adding
Ozone as dependency to these client projects, where they want to use Ozone.
Anyway not starting this again 😉

Quoting Wei-Chiu itself from the first mail, he agreed that adding those
aren't a good idea:

> This is straightforward but the FileSystem would become
> bloated.
>

It was an agreed and known fact, now if it has changed, then can't help it.
A quick HADOOP-18671 <https://issues.apache.org/jira/browse/HADOOP-18671> which
got raised by Wei-Chiu to add these to "FileSystem" has a comment as well
to get them around interface, that still is "kind of Ok", If done properly,
in addition to what already mentioned there in the comments, Without
introducing any "incompatibilities" and not just test but "stable" tests.
Not like some of the recent few changes where folks come and say "we can't
do it without breaking anything", and other people trying to fix the mess,
not getting off topic with that here....

Second idea mentioned in the original mail is also similar to mentioned in
the comment in the above ticket and is still quite acceptable, name can be
negotiated though, Add an interface to pull the relevant methods up in that
without touching FileSystem class, we can have DFS implement that and Ozone
FS implement them as well. We should be sorted: No Hacking, No Bothering
FileSystem and still things can work

-Ayush




On Thu, 23 Mar 2023 at 14:01, Tsz Wo Sze <sz...@yahoo.com> wrote:

> (Clicked "send" too accidentally.  Please ignore my previous email.
> Sorry.)
>
> Hi,
>
>
> We probably should exclude HBase in this discuss.  I guess Wei-Chiu
> mentioning it as an example use case.  There are other projects such as
> Apache Solr requiring similar features.
>
>
> (1) We already has the Syncable (hsync/hflush) interface in Hadoop, it
> makes sense to have a recover() method for recovering hsync'ed/hflushed
> files.  Otherwise, the Syncable feature is incomplete.
>
>
> (2) I also suggest to add a isReady() method.  In FileSystem, there is an
> initialize(..) and its javadoc says:
>
>    * Called after the new FileSystem instance is constructed, and before it
>    * is ready for use.
>
> However, there is no way to check if the FileSystem instance is ready.
>
>
> These are currently two missing features in Hadoop FileSystem.
>
>
> Tsz-Wo
>
>
>
>
> On Wednesday, March 22, 2023, 12:30:49 PM GMT+8, Ayush Saxena <
> ayushtkn@gmail.com> wrote:
>
>
> Well reflections are good or not will drag this somewhere else. I will
> respect what Tsz-Wo said and put this in my rule book for future :)
>
> If I get into Why we don’t have “all” the API in FileSystem itself will
> drag it to another area, What and where to use Abstraction and stuff like
> that, Which none of the people over here would be interested in.
>
> On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
> suggested as Tsz-Wo considers it as a hack.
>
> I have strict objections on pulling them up to FileSystem class because
> they are very core to HDFS and the mentioned API are not just the ones,
> ErasureCoding? And Tomorrow we would have similar requests for ABFS only
> API, Huawei OBS only and many more. FileSystem class would become huge and
> the Technical Debt that we would bring in or encourage would be really
> high. Who is gonna chase behind people if these code creates issues
> somewhere else? I don’t want to quote example publicly and proove a point
> or so. So, leaving this here. With my conclusion on this solution.
>
> Technically it is a HBase problem, they should adapt to Ozone, not sure why
> are they creating unnecessary sound. I got into similar situation for Ozone
> in a different downstream project and folks were very encouraging to get
> and add support for Ozone, Guava messed up else it would have been in. Now
> with this approach those guys won’t even agree. “Go handle at Hadoop or
> Ozone, Why us”, this is something neither of the projects want….
>
> There were mention of favoured node and all as well in the Hbase ML, after
> this would be these stuff, IIRC. The proposal for having option for
> favoured node in Distcp was vetoed recently considering it HDFS only (not
> by me), so thats never ending….
>
> We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
> 🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
> think now people don’t consider it as a solution and we will keep on doing
> stuff for Hbase and then other folks will keep on managing, maintaining and
> releasing them forever!!!
>
> Good Luck!!! But 2 possible solutions are down in this thread
>
> -Ayush
>
> On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <sz...@yahoo.com> wrote:
>
> >
> > Ayush,
> >
> >
> > Yes, reflections are a part of Java.  Why we have to define the
> > FileSystem APIs but not simply use reflections all the times?
> >
> >
> > Reflection is good for dealing with unknown code such as loading a
> plugin,
> > code analysis, etc.  However, it probably is not a good way to define
> APIs.
> >
> >
> >
> > Tsz-Wo
> >
> >
> > On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> > ayushtkn@gmail.com> wrote:
> >
> >
> > I am not sure what classifies as a Hack and what not, I thought
> reflections
> > are part of Java.
> >
> > Whatever solution but pulling in just the HDFS specific stuff to
> FileSystem
> > just for Ozone, because Hbase guys didn’t agree and we have people in
> > Hadoop who we can convince, I am -1 to such an approach and mindset.
> Hbase
> > wants ozone, they should give way for it like they do for HDFS
> >
> > Explore ways in Hbase, explore the Utils and ways by the links that Steve
> > shared, try ViewDFS, When we have some more convincing reasons, we can
> > discuss more over here to pull them to FileSystem as the last option
> >
> > -Ayush
> >
> > On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Stephen and I are working on a project to make HBase to run on Ozone.
> > >
> > > HBase, born out of the Hadoop project, depends on a number of HDFS
> > specific
> > > APIs, including recoverLease() and isInSafeMode(). The HBase community
> > [1]
> > > strongly voiced that they don't want the project to have direct
> > dependency
> > > on additional FS implementations due to dependency and vulnerability
> > > management concerns.
> > >
> > > To make this project successful, we're exploring options, to push up
> > these
> > > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > > implementation agnostic, and perhaps enable HBase to support other
> > storage
> > > systems in the future.
> > >
> > > We'd use the PathCapabilities API to probe if the underlying FS
> > > implementation supports these APIs, and would then invoke the
> > corresponding
> > > FileSystem APIs. This is straightforward but the FileSystem would
> become
> > > bloated.
> > >
> > > Another option is to create a "RecoverableFileSystem" interface, and
> have
> > > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone).
> This
> > > way the impact to the Hadoop project and the FileSystem abstraction is
> > even
> > > smaller.
> > >
> > > Thoughts?
> > >
> > > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> > >
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Tsz Wo Sze <sz...@yahoo.com.INVALID>.
 (Clicked "send" too accidentally.  Please ignore my previous email.  Sorry.)
  Hi,

We probably should exclude HBase in this discuss.  I guess Wei-Chiu mentioning it as an example use case.  There are other projects such as Apache Solr requiring similar features.

(1) We already has the Syncable (hsync/hflush) interface in Hadoop, it makes sense to have a recover() method for recovering hsync'ed/hflushed files.  Otherwise, the Syncable feature is incomplete.

(2) I also suggest to add a isReady() method.  In FileSystem, there is an initialize(..) and its javadoc says:
   * Called after the new FileSystem instance is constructed, and before it   * is ready for use.
However, there is no way to check if the FileSystem instance is ready.   

These are currently two missing features in Hadoop FileSystem.

Tsz-Wo



 On Wednesday, March 22, 2023, 12:30:49 PM GMT+8, Ayush Saxena <ay...@gmail.com> wrote:  
 
 Well reflections are good or not will drag this somewhere else. I will
respect what Tsz-Wo said and put this in my rule book for future :)

If I get into Why we don’t have “all” the API in FileSystem itself will
drag it to another area, What and where to use Abstraction and stuff like
that, Which none of the people over here would be interested in.

On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
suggested as Tsz-Wo considers it as a hack.

I have strict objections on pulling them up to FileSystem class because
they are very core to HDFS and the mentioned API are not just the ones,
ErasureCoding? And Tomorrow we would have similar requests for ABFS only
API, Huawei OBS only and many more. FileSystem class would become huge and
the Technical Debt that we would bring in or encourage would be really
high. Who is gonna chase behind people if these code creates issues
somewhere else? I don’t want to quote example publicly and proove a point
or so. So, leaving this here. With my conclusion on this solution.

Technically it is a HBase problem, they should adapt to Ozone, not sure why
are they creating unnecessary sound. I got into similar situation for Ozone
in a different downstream project and folks were very encouraging to get
and add support for Ozone, Guava messed up else it would have been in. Now
with this approach those guys won’t even agree. “Go handle at Hadoop or
Ozone, Why us”, this is something neither of the projects want….

There were mention of favoured node and all as well in the Hbase ML, after
this would be these stuff, IIRC. The proposal for having option for
favoured node in Distcp was vetoed recently considering it HDFS only (not
by me), so thats never ending….

We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
think now people don’t consider it as a solution and we will keep on doing
stuff for Hbase and then other folks will keep on managing, maintaining and
releasing them forever!!!

Good Luck!!! But 2 possible solutions are down in this thread

-Ayush

On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <sz...@yahoo.com> wrote:

>
> Ayush,
>
>
> Yes, reflections are a part of Java.  Why we have to define the
> FileSystem APIs but not simply use reflections all the times?
>
>
> Reflection is good for dealing with unknown code such as loading a plugin,
> code analysis, etc.  However, it probably is not a good way to define APIs.
>
>
>
> Tsz-Wo
>
>
> On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> ayushtkn@gmail.com> wrote:
>
>
> I am not sure what classifies as a Hack and what not, I thought reflections
> are part of Java.
>
> Whatever solution but pulling in just the HDFS specific stuff to FileSystem
> just for Ozone, because Hbase guys didn’t agree and we have people in
> Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
> wants ozone, they should give way for it like they do for HDFS
>
> Explore ways in Hbase, explore the Utils and ways by the links that Steve
> shared, try ViewDFS, When we have some more convincing reasons, we can
> discuss more over here to pull them to FileSystem as the last option
>
> -Ayush
>
> On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>
    

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Tsz Wo Sze <sz...@yahoo.com.INVALID>.
 Hi,
We probably should exclude HBase in this discuss.  I guess Wei-Chiu mentioning it as an example use case.  There are other projects such as Apache Solr requiring similar features.
(1) We already has the Syncable (hsync/hflush) interface in Hadoop, it makes sense to have a recover() method for recovering hsync'ed/hflushed files.  Otherwise,  the Syncable feature is incomplete.
(2) I also suggest to add a isRead() method.  In FileSystem, there is an initialize(..) and its javadoc says:
   * Called after the new FileSystem instance is constructed, and before it   * is ready for use.
However, there is no way to check if the FileSystem instance is ready.    On Wednesday, March 22, 2023, 12:30:49 PM GMT+8, Ayush Saxena <ay...@gmail.com> wrote:  
 
 Well reflections are good or not will drag this somewhere else. I will
respect what Tsz-Wo said and put this in my rule book for future :)

If I get into Why we don’t have “all” the API in FileSystem itself will
drag it to another area, What and where to use Abstraction and stuff like
that, Which none of the people over here would be interested in.

On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
suggested as Tsz-Wo considers it as a hack.

I have strict objections on pulling them up to FileSystem class because
they are very core to HDFS and the mentioned API are not just the ones,
ErasureCoding? And Tomorrow we would have similar requests for ABFS only
API, Huawei OBS only and many more. FileSystem class would become huge and
the Technical Debt that we would bring in or encourage would be really
high. Who is gonna chase behind people if these code creates issues
somewhere else? I don’t want to quote example publicly and proove a point
or so. So, leaving this here. With my conclusion on this solution.

Technically it is a HBase problem, they should adapt to Ozone, not sure why
are they creating unnecessary sound. I got into similar situation for Ozone
in a different downstream project and folks were very encouraging to get
and add support for Ozone, Guava messed up else it would have been in. Now
with this approach those guys won’t even agree. “Go handle at Hadoop or
Ozone, Why us”, this is something neither of the projects want….

There were mention of favoured node and all as well in the Hbase ML, after
this would be these stuff, IIRC. The proposal for having option for
favoured node in Distcp was vetoed recently considering it HDFS only (not
by me), so thats never ending….

We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
think now people don’t consider it as a solution and we will keep on doing
stuff for Hbase and then other folks will keep on managing, maintaining and
releasing them forever!!!

Good Luck!!! But 2 possible solutions are down in this thread

-Ayush

On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <sz...@yahoo.com> wrote:

>
> Ayush,
>
>
> Yes, reflections are a part of Java.  Why we have to define the
> FileSystem APIs but not simply use reflections all the times?
>
>
> Reflection is good for dealing with unknown code such as loading a plugin,
> code analysis, etc.  However, it probably is not a good way to define APIs.
>
>
>
> Tsz-Wo
>
>
> On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> ayushtkn@gmail.com> wrote:
>
>
> I am not sure what classifies as a Hack and what not, I thought reflections
> are part of Java.
>
> Whatever solution but pulling in just the HDFS specific stuff to FileSystem
> just for Ozone, because Hbase guys didn’t agree and we have people in
> Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
> wants ozone, they should give way for it like they do for HDFS
>
> Explore ways in Hbase, explore the Utils and ways by the links that Steve
> shared, try ViewDFS, When we have some more convincing reasons, we can
> discuss more over here to pull them to FileSystem as the last option
>
> -Ayush
>
> On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>
  

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
Well reflections are good or not will drag this somewhere else. I will
respect what Tsz-Wo said and put this in my rule book for future :)

If I get into Why we don’t have “all” the API in FileSystem itself will
drag it to another area, What and where to use Abstraction and stuff like
that, Which none of the people over here would be interested in.

On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
suggested as Tsz-Wo considers it as a hack.

I have strict objections on pulling them up to FileSystem class because
they are very core to HDFS and the mentioned API are not just the ones,
ErasureCoding? And Tomorrow we would have similar requests for ABFS only
API, Huawei OBS only and many more. FileSystem class would become huge and
the Technical Debt that we would bring in or encourage would be really
high. Who is gonna chase behind people if these code creates issues
somewhere else? I don’t want to quote example publicly and proove a point
or so. So, leaving this here. With my conclusion on this solution.

Technically it is a HBase problem, they should adapt to Ozone, not sure why
are they creating unnecessary sound. I got into similar situation for Ozone
in a different downstream project and folks were very encouraging to get
and add support for Ozone, Guava messed up else it would have been in. Now
with this approach those guys won’t even agree. “Go handle at Hadoop or
Ozone, Why us”, this is something neither of the projects want….

There were mention of favoured node and all as well in the Hbase ML, after
this would be these stuff, IIRC. The proposal for having option for
favoured node in Distcp was vetoed recently considering it HDFS only (not
by me), so thats never ending….

We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
think now people don’t consider it as a solution and we will keep on doing
stuff for Hbase and then other folks will keep on managing, maintaining and
releasing them forever!!!

Good Luck!!! But 2 possible solutions are down in this thread

-Ayush

On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <sz...@yahoo.com> wrote:

>
> Ayush,
>
>
> Yes, reflections are a part of Java.  Why we have to define the
> FileSystem APIs but not simply use reflections all the times?
>
>
> Reflection is good for dealing with unknown code such as loading a plugin,
> code analysis, etc.   However, it probably is not a good way to define APIs.
>
>
>
> Tsz-Wo
>
>
> On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> ayushtkn@gmail.com> wrote:
>
>
> I am not sure what classifies as a Hack and what not, I thought reflections
> are part of Java.
>
> Whatever solution but pulling in just the HDFS specific stuff to FileSystem
> just for Ozone, because Hbase guys didn’t agree and we have people in
> Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
> wants ozone, they should give way for it like they do for HDFS
>
> Explore ways in Hbase, explore the Utils and ways by the links that Steve
> shared, try ViewDFS, When we have some more convincing reasons, we can
> discuss more over here to pull them to FileSystem as the last option
>
> -Ayush
>
> On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
Well reflections are good or not will drag this somewhere else. I will
respect what Tsz-Wo said and put this in my rule book for future :)

If I get into Why we don’t have “all” the API in FileSystem itself will
drag it to another area, What and where to use Abstraction and stuff like
that, Which none of the people over here would be interested in.

On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
suggested as Tsz-Wo considers it as a hack.

I have strict objections on pulling them up to FileSystem class because
they are very core to HDFS and the mentioned API are not just the ones,
ErasureCoding? And Tomorrow we would have similar requests for ABFS only
API, Huawei OBS only and many more. FileSystem class would become huge and
the Technical Debt that we would bring in or encourage would be really
high. Who is gonna chase behind people if these code creates issues
somewhere else? I don’t want to quote example publicly and proove a point
or so. So, leaving this here. With my conclusion on this solution.

Technically it is a HBase problem, they should adapt to Ozone, not sure why
are they creating unnecessary sound. I got into similar situation for Ozone
in a different downstream project and folks were very encouraging to get
and add support for Ozone, Guava messed up else it would have been in. Now
with this approach those guys won’t even agree. “Go handle at Hadoop or
Ozone, Why us”, this is something neither of the projects want….

There were mention of favoured node and all as well in the Hbase ML, after
this would be these stuff, IIRC. The proposal for having option for
favoured node in Distcp was vetoed recently considering it HDFS only (not
by me), so thats never ending….

We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
think now people don’t consider it as a solution and we will keep on doing
stuff for Hbase and then other folks will keep on managing, maintaining and
releasing them forever!!!

Good Luck!!! But 2 possible solutions are down in this thread

-Ayush

On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <sz...@yahoo.com> wrote:

>
> Ayush,
>
>
> Yes, reflections are a part of Java.  Why we have to define the
> FileSystem APIs but not simply use reflections all the times?
>
>
> Reflection is good for dealing with unknown code such as loading a plugin,
> code analysis, etc.   However, it probably is not a good way to define APIs.
>
>
>
> Tsz-Wo
>
>
> On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> ayushtkn@gmail.com> wrote:
>
>
> I am not sure what classifies as a Hack and what not, I thought reflections
> are part of Java.
>
> Whatever solution but pulling in just the HDFS specific stuff to FileSystem
> just for Ozone, because Hbase guys didn’t agree and we have people in
> Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
> wants ozone, they should give way for it like they do for HDFS
>
> Explore ways in Hbase, explore the Utils and ways by the links that Steve
> shared, try ViewDFS, When we have some more convincing reasons, we can
> discuss more over here to pull them to FileSystem as the last option
>
> -Ayush
>
> On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org>
> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Tsz Wo Sze <sz...@yahoo.com.INVALID>.
 
Ayush,


Yes, reflections are a part of Java.  Why we have to define the FileSystem APIs but not simply use reflections all the times?

Reflection is good for dealing with unknown code such as loading a plugin, code analysis, etc.   However, it probably is not a good way to define APIs.


Tsz-Wo

    On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <ay...@gmail.com> wrote:  
 
 I am not sure what classifies as a Hack and what not, I thought reflections
are part of Java.

Whatever solution but pulling in just the HDFS specific stuff to FileSystem
just for Ozone, because Hbase guys didn’t agree and we have people in
Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
wants ozone, they should give way for it like they do for HDFS

Explore ways in Hbase, explore the Utils and ways by the links that Steve
shared, try ViewDFS, When we have some more convincing reasons, we can
discuss more over here to pull them to FileSystem as the last option

-Ayush

On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi,
>
> Stephen and I are working on a project to make HBase to run on Ozone.
>
> HBase, born out of the Hadoop project, depends on a number of HDFS specific
> APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
> strongly voiced that they don't want the project to have direct dependency
> on additional FS implementations due to dependency and vulnerability
> management concerns.
>
> To make this project successful, we're exploring options, to push up these
> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> implementation agnostic, and perhaps enable HBase to support other storage
> systems in the future.
>
> We'd use the PathCapabilities API to probe if the underlying FS
> implementation supports these APIs, and would then invoke the corresponding
> FileSystem APIs. This is straightforward but the FileSystem would become
> bloated.
>
> Another option is to create a "RecoverableFileSystem" interface, and have
> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> way the impact to the Hadoop project and the FileSystem abstraction is even
> smaller.
>
> Thoughts?
>
> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>
  

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
I am not sure what classifies as a Hack and what not, I thought reflections
are part of Java.

Whatever solution but pulling in just the HDFS specific stuff to FileSystem
just for Ozone, because Hbase guys didn’t agree and we have people in
Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
wants ozone, they should give way for it like they do for HDFS

Explore ways in Hbase, explore the Utils and ways by the links that Steve
shared, try ViewDFS, When we have some more convincing reasons, we can
discuss more over here to pull them to FileSystem as the last option

-Ayush

On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi,
>
> Stephen and I are working on a project to make HBase to run on Ozone.
>
> HBase, born out of the Hadoop project, depends on a number of HDFS specific
> APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
> strongly voiced that they don't want the project to have direct dependency
> on additional FS implementations due to dependency and vulnerability
> management concerns.
>
> To make this project successful, we're exploring options, to push up these
> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> implementation agnostic, and perhaps enable HBase to support other storage
> systems in the future.
>
> We'd use the PathCapabilities API to probe if the underlying FS
> implementation supports these APIs, and would then invoke the corresponding
> FileSystem APIs. This is straightforward but the FileSystem would become
> bloated.
>
> Another option is to create a "RecoverableFileSystem" interface, and have
> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> way the impact to the Hadoop project and the FileSystem abstraction is even
> smaller.
>
> Thoughts?
>
> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
   1. I think a new interface would be good as FileContext could do the
   same thing
   2. using PathCapabilities probes should still be mandatory as for
   FileContext it would depend on the back end
   3. Whoever does this gets to specify what the API does and write the
   contract tests. Saying "just to do what HDFS does" isn't enough as it's not
   always clear the HDFS team no how much of that behaviour is intentional
   (rename, anyone?).


For any new API (a better rename, a better delete,...) I would normally
insist on making it cloud friendly, with an extensible builder API and an
emphasis on asynchronous IO. However this is existing code and does target
HDFS and Ozone -pulling the existing APIs up into a new interface seems the
right thing to do here.

 I have a WiP project to do a shim library to offer new FS APIs two older
Hadoop releases by way of reflection, so that we can get new APIs taken up
across projects where we cannot choreograph version updates across the
entire stack. (hello parquet, spark,...). My goal is to actually make this
a Hadoop managed project, with its own release schedule. You could add an
equivalent of the new interface in here, which would then use reflection
behind-the-scenes to invoke the underlying HDFS methods when the FS client
has them.

https://github.com/steveloughran/fs-api-shim

I've just added vector IO API there; the next step is to copy over a lot of
the contract tests from hadoop common and apply them through the shim -to
hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
tricky as the reflection itself. However without this library it is going
to take a long long time for the open source applications to pick up the
higher performance/Cloud ready Apis. Yes, those of us who can build the
entire stack can do it, but that gradually adds more divergence from the
open source libraries, reduces the test coverage overall and only increases
maintenance costs over time.

steve

On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi,
>
> Stephen and I are working on a project to make HBase to run on Ozone.
>
> HBase, born out of the Hadoop project, depends on a number of HDFS specific
> APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
> strongly voiced that they don't want the project to have direct dependency
> on additional FS implementations due to dependency and vulnerability
> management concerns.
>
> To make this project successful, we're exploring options, to push up these
> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> implementation agnostic, and perhaps enable HBase to support other storage
> systems in the future.
>
> We'd use the PathCapabilities API to probe if the underlying FS
> implementation supports these APIs, and would then invoke the corresponding
> FileSystem APIs. This is straightforward but the FileSystem would become
> bloated.
>
> Another option is to create a "RecoverableFileSystem" interface, and have
> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> way the impact to the Hadoop project and the FileSystem abstraction is even
> smaller.
>
> Thoughts?
>
> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>

Re: [DISCUSS] Move HDFS specific APIs to FileSystem abstration

Posted by Ayush Saxena <ay...@gmail.com>.
I am not sure what classifies as a Hack and what not, I thought reflections
are part of Java.

Whatever solution but pulling in just the HDFS specific stuff to FileSystem
just for Ozone, because Hbase guys didn’t agree and we have people in
Hadoop who we can convince, I am -1 to such an approach and mindset. Hbase
wants ozone, they should give way for it like they do for HDFS

Explore ways in Hbase, explore the Utils and ways by the links that Steve
shared, try ViewDFS, When we have some more convincing reasons, we can
discuss more over here to pull them to FileSystem as the last option

-Ayush

On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <we...@apache.org> wrote:

> Hi,
>
> Stephen and I are working on a project to make HBase to run on Ozone.
>
> HBase, born out of the Hadoop project, depends on a number of HDFS specific
> APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
> strongly voiced that they don't want the project to have direct dependency
> on additional FS implementations due to dependency and vulnerability
> management concerns.
>
> To make this project successful, we're exploring options, to push up these
> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> implementation agnostic, and perhaps enable HBase to support other storage
> systems in the future.
>
> We'd use the PathCapabilities API to probe if the underlying FS
> implementation supports these APIs, and would then invoke the corresponding
> FileSystem APIs. This is straightforward but the FileSystem would become
> bloated.
>
> Another option is to create a "RecoverableFileSystem" interface, and have
> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> way the impact to the Hadoop project and the FileSystem abstraction is even
> smaller.
>
> Thoughts?
>
> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>