You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Steve Loughran <st...@cloudera.com.INVALID> on 2020/02/28 13:47:40 UTC

HDFS-13616 : batch listing of multiple directories

https://issues.apache.org/jira/browse/HDFS-13616

I don't want to be territorial here -but as I keep reminding this list
whenever it happens,  -I do not want any changes to go into the core
FileSystem class without

* raising a HADOOP- JIRA
* involving those of us who work on object stores. We have different
problems (latencies, failure modes) and want to move to move
async/completable APIs, ideally with builder APIs for future flexibility
and per-FS options.
* specify semantics formally enough that people implementing and using know
what they get.
* a specification in the filesystem.md
* contract tests to match the spec and which object stores can implement,
as well as HDFS

The change has ~no javadocs and doesn't even state
* whether it's recursive or not.
* whether it includes directories or not

batchedListStatusIterator is exactly the kind of feature this should apply
to -it is where we get a chance to fix those limitations of the previous
calls (blocking sync, no expectation of right to cancel listings), ...

I'd like to be able to
* provide a hint on batch sizes.
* get an async response so the fact the LIST can can take time is more
visible.
* and let us cancel that query if it is taking too long

I also like to be able to close an iterator too; that is something we
can/should retrofit, or require all implementations to add


Completable<RemoteIterator<PartialListing<S extends FileStatus>> listing =
  batchList(Path)
   .recursive(true)
   .opt("fs.option.batchlist.size", 100)
   .build()

RemoteIterator<PartialListing<FileStatus> it = listing.get()

FileStatus largeFile = null;

try {
  while(it.hasNext()) {
    FileStatus st = it.next();
    if (st.length()> 1_000_000) {
      largeFile = st;
      break;
    }
  } finally {
    if (it instanceof Closeable) {
      IOUtils.closeQuietly((Closeable)it);
    }
  }

  if (largeFile != null) {
    processLargeFile(largeFile);
  }
}

See: something for slower IO, controllable batch sizes and a way to cancel
the scan -so let us recycle the HTTP connection even when breaking out
early.

This is a recurrent problem and I am getting as bored as a sending these
emails out as people probably are at receiving them.

Please please at least talk to me. Yes I'm going to add more homework but
the goal is to make it something well documented well testable and
straightforward to implement by other implementations without us having to
reverse engineer HDFS's behaviour and consider that a normative

What I do here?
1. Do I overreact and revert the change until my needs are met? Because I
know that if I volunteered to do this work myself it's going to get
neglected.
2. Is someone going to put their hand up to help this?

At the very least, I'm going to tag the APIs as unstable and potentially
likely to break so that anyone who uses it in hadoop-3.3.0 isn't going to
be upset when it is moved to a builder API. And it will have to  for the
objects stores.

sorry

steve

Re: HDFS-13616 : batch listing of multiple directories

Posted by Chao Sun <su...@apache.org>.

Hi Steve,

Thanks for your valuable feedback and apologies for overlooking the needs
from object store and others! I'll look into the PR.

> Then, after 3.3.0 is out, someone gets to do the FileSystem
implementation, with specification, tests etc. Not me -you are the HDFS
team -of course you can do this.

+1. I can help to improve documentation and testing for this once 3.3.0 is
out.

Chao

On Mon, Mar 2, 2020 at 5:13 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

> On Sat, 29 Feb 2020 at 00:23, Wei-Chiu Chuang <weichiu@cloudera.com.invalid
> >
> wrote:
>
> > Steve,
> >
> > You made a great point and I'm sorry this API was implemented without
> > consideration of other FS implementation.
> > Thank you for your direct feedback.
> >
> > async -- yes
> > builder -- yes
> > cancellable -- totally agree
> >
> > There are good use cases for this API though -- Impala and Presto both
> > require lots of file system metadata operation, and this API would make
> > them much more efficient.
> >
>
> Well, I absolutely do not want an API we will have to maintain for decades
> yet lacks a rigorous specification other than "look at the HDFS
> implementation" and has seemingly not taken the needs of cloud storage into
> account.
>
> It is the long-term obligation to maintain this API which I am most worried
> about. Please: don't add new operations there unless you think it ready for
> broad use and that, in the absence of any builder API, is "the perfect
> operation"
>
> Proposed:
>
> * the new API is pulled into a new interface marked unstable; FileSystems
> subclasses get to implement if they support it.
> * new classes (PartialListing) also tagged unstable. Side issue, please,
> always mark new stuff as Evolving.
> * and that interface extends PathCapabilities, so you can't implement it
> without declaring whether paths support the feature.
> * And we will define a new path capability.
>
> Applications can cast to the new interface, and use PathCapabilities to
> verify it is actually available under a given path, even through filter
> filesystems
>
>
> Then, after 3.3.0 is out, someone gets to do the FileSystem implementation,
> with specification, tests etc. Not me -you are the HDFS team -of course you
> can do this. But it does need to be done taking into account the fact that
> alternate stores and far systems will be wanting to implement this and all
> others will be fielding support calls related to it for a long time.
>
>
> > On top of that, I would also like to have a batched delete API. HBase
> could
> > benefit a lot from that.
> >
> >
> Another interesting bit of work, especially since Gabor and I have just
> been dealing with S3 delete throttling issues.
>
> I will gladly give advice there. at the very least, it must implement
> Progressable, so when deletes are very slow processes/threads can still
> send heartbeats back. It also sets expectations up as to how long some of
> these things can take.
>
> -Steve
>
>
> >
> >
> > On Fri, Feb 28, 2020 at 5:48 AM Steve Loughran
> <stevel@cloudera.com.invalid
> > >
> > wrote:
> >
> > > https://issues.apache.org/jira/browse/HDFS-13616
> > >
> > > I don't want to be territorial here -but as I keep reminding this list
> > > whenever it happens,  -I do not want any changes to go into the core
> > > FileSystem class without
> > >
> > > * raising a HADOOP- JIRA
> > > * involving those of us who work on object stores. We have different
> > > problems (latencies, failure modes) and want to move to move
> > > async/completable APIs, ideally with builder APIs for future
> flexibility
> > > and per-FS options.
> > > * specify semantics formally enough that people implementing and using
> > know
> > > what they get.
> > > * a specification in the filesystem.md
> > > * contract tests to match the spec and which object stores can
> implement,
> > > as well as HDFS
> > >
> > > The change has ~no javadocs and doesn't even state
> > > * whether it's recursive or not.
> > > * whether it includes directories or not
> > >
> > > batchedListStatusIterator is exactly the kind of feature this should
> > apply
> > > to -it is where we get a chance to fix those limitations of the
> previous
> > > calls (blocking sync, no expectation of right to cancel listings), ...
> > >
> > > I'd like to be able to
> > > * provide a hint on batch sizes.
> > > * get an async response so the fact the LIST can can take time is more
> > > visible.
> > > * and let us cancel that query if it is taking too long
> > >
> > > I also like to be able to close an iterator too; that is something we
> > > can/should retrofit, or require all implementations to add
> > >
> > >
> > > Completable<RemoteIterator<PartialListing<S extends FileStatus>>
> listing
> > =
> > >   batchList(Path)
> > >    .recursive(true)
> > >    .opt("fs.option.batchlist.size", 100)
> > >    .build()
> > >
> > > RemoteIterator<PartialListing<FileStatus> it = listing.get()
> > >
> > > FileStatus largeFile = null;
> > >
> > > try {
> > >   while(it.hasNext()) {
> > >     FileStatus st = it.next();
> > >     if (st.length()> 1_000_000) {
> > >       largeFile = st;
> > >       break;
> > >     }
> > >   } finally {
> > >     if (it instanceof Closeable) {
> > >       IOUtils.closeQuietly((Closeable)it);
> > >     }
> > >   }
> > >
> > >   if (largeFile != null) {
> > >     processLargeFile(largeFile);
> > >   }
> > > }
> > >
> > > See: something for slower IO, controllable batch sizes and a way to
> > cancel
> > > the scan -so let us recycle the HTTP connection even when breaking out
> > > early.
> > >
> > > This is a recurrent problem and I am getting as bored as a sending
> these
> > > emails out as people probably are at receiving them.
> > >
> > > Please please at least talk to me. Yes I'm going to add more homework
> but
> > > the goal is to make it something well documented well testable and
> > > straightforward to implement by other implementations without us having
> > to
> > > reverse engineer HDFS's behaviour and consider that a normative
> > >
> > > What I do here?
> > > 1. Do I overreact and revert the change until my needs are met?
> Because I
> > > know that if I volunteered to do this work myself it's going to get
> > > neglected.
> > > 2. Is someone going to put their hand up to help this?
> > >
> > > At the very least, I'm going to tag the APIs as unstable and
> potentially
> > > likely to break so that anyone who uses it in hadoop-3.3.0 isn't going
> to
> > > be upset when it is moved to a builder API. And it will have to  for
> the
> > > objects stores.
> > >
> > > sorry
> > >
> > > steve
> > >
> >
>

Re: HDFS-13616 : batch listing of multiple directories

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

On Sat, 29 Feb 2020 at 00:23, Wei-Chiu Chuang <we...@cloudera.com.invalid>
wrote:

> Steve,
>
> You made a great point and I'm sorry this API was implemented without
> consideration of other FS implementation.
> Thank you for your direct feedback.
>
> async -- yes
> builder -- yes
> cancellable -- totally agree
>
> There are good use cases for this API though -- Impala and Presto both
> require lots of file system metadata operation, and this API would make
> them much more efficient.
>

Well, I absolutely do not want an API we will have to maintain for decades
yet lacks a rigorous specification other than "look at the HDFS
implementation" and has seemingly not taken the needs of cloud storage into
account.

It is the long-term obligation to maintain this API which I am most worried
about. Please: don't add new operations there unless you think it ready for
broad use and that, in the absence of any builder API, is "the perfect
operation"

Proposed:

* the new API is pulled into a new interface marked unstable; FileSystems
subclasses get to implement if they support it.
* new classes (PartialListing) also tagged unstable. Side issue, please,
always mark new stuff as Evolving.
* and that interface extends PathCapabilities, so you can't implement it
without declaring whether paths support the feature.
* And we will define a new path capability.

Applications can cast to the new interface, and use PathCapabilities to
verify it is actually available under a given path, even through filter
filesystems


Then, after 3.3.0 is out, someone gets to do the FileSystem implementation,
with specification, tests etc. Not me -you are the HDFS team -of course you
can do this. But it does need to be done taking into account the fact that
alternate stores and far systems will be wanting to implement this and all
others will be fielding support calls related to it for a long time.


> On top of that, I would also like to have a batched delete API. HBase could
> benefit a lot from that.
>
>
Another interesting bit of work, especially since Gabor and I have just
been dealing with S3 delete throttling issues.

I will gladly give advice there. at the very least, it must implement
Progressable, so when deletes are very slow processes/threads can still
send heartbeats back. It also sets expectations up as to how long some of
these things can take.

-Steve


>
>
> On Fri, Feb 28, 2020 at 5:48 AM Steve Loughran <stevel@cloudera.com.invalid
> >
> wrote:
>
> > https://issues.apache.org/jira/browse/HDFS-13616
> >
> > I don't want to be territorial here -but as I keep reminding this list
> > whenever it happens,  -I do not want any changes to go into the core
> > FileSystem class without
> >
> > * raising a HADOOP- JIRA
> > * involving those of us who work on object stores. We have different
> > problems (latencies, failure modes) and want to move to move
> > async/completable APIs, ideally with builder APIs for future flexibility
> > and per-FS options.
> > * specify semantics formally enough that people implementing and using
> know
> > what they get.
> > * a specification in the filesystem.md
> > * contract tests to match the spec and which object stores can implement,
> > as well as HDFS
> >
> > The change has ~no javadocs and doesn't even state
> > * whether it's recursive or not.
> > * whether it includes directories or not
> >
> > batchedListStatusIterator is exactly the kind of feature this should
> apply
> > to -it is where we get a chance to fix those limitations of the previous
> > calls (blocking sync, no expectation of right to cancel listings), ...
> >
> > I'd like to be able to
> > * provide a hint on batch sizes.
> > * get an async response so the fact the LIST can can take time is more
> > visible.
> > * and let us cancel that query if it is taking too long
> >
> > I also like to be able to close an iterator too; that is something we
> > can/should retrofit, or require all implementations to add
> >
> >
> > Completable<RemoteIterator<PartialListing<S extends FileStatus>> listing
> =
> >   batchList(Path)
> >    .recursive(true)
> >    .opt("fs.option.batchlist.size", 100)
> >    .build()
> >
> > RemoteIterator<PartialListing<FileStatus> it = listing.get()
> >
> > FileStatus largeFile = null;
> >
> > try {
> >   while(it.hasNext()) {
> >     FileStatus st = it.next();
> >     if (st.length()> 1_000_000) {
> >       largeFile = st;
> >       break;
> >     }
> >   } finally {
> >     if (it instanceof Closeable) {
> >       IOUtils.closeQuietly((Closeable)it);
> >     }
> >   }
> >
> >   if (largeFile != null) {
> >     processLargeFile(largeFile);
> >   }
> > }
> >
> > See: something for slower IO, controllable batch sizes and a way to
> cancel
> > the scan -so let us recycle the HTTP connection even when breaking out
> > early.
> >
> > This is a recurrent problem and I am getting as bored as a sending these
> > emails out as people probably are at receiving them.
> >
> > Please please at least talk to me. Yes I'm going to add more homework but
> > the goal is to make it something well documented well testable and
> > straightforward to implement by other implementations without us having
> to
> > reverse engineer HDFS's behaviour and consider that a normative
> >
> > What I do here?
> > 1. Do I overreact and revert the change until my needs are met? Because I
> > know that if I volunteered to do this work myself it's going to get
> > neglected.
> > 2. Is someone going to put their hand up to help this?
> >
> > At the very least, I'm going to tag the APIs as unstable and potentially
> > likely to break so that anyone who uses it in hadoop-3.3.0 isn't going to
> > be upset when it is moved to a builder API. And it will have to  for the
> > objects stores.
> >
> > sorry
> >
> > steve
> >
>

Re: HDFS-13616 : batch listing of multiple directories

Posted by Wei-Chiu Chuang <we...@cloudera.com.INVALID>.

Steve,

You made a great point and I'm sorry this API was implemented without
consideration of other FS implementation.
Thank you for your direct feedback.

async -- yes
builder -- yes
cancellable -- totally agree

There are good use cases for this API though -- Impala and Presto both
require lots of file system metadata operation, and this API would make
them much more efficient.
On top of that, I would also like to have a batched delete API. HBase could
benefit a lot from that.



On Fri, Feb 28, 2020 at 5:48 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

> https://issues.apache.org/jira/browse/HDFS-13616
>
> I don't want to be territorial here -but as I keep reminding this list
> whenever it happens,  -I do not want any changes to go into the core
> FileSystem class without
>
> * raising a HADOOP- JIRA
> * involving those of us who work on object stores. We have different
> problems (latencies, failure modes) and want to move to move
> async/completable APIs, ideally with builder APIs for future flexibility
> and per-FS options.
> * specify semantics formally enough that people implementing and using know
> what they get.
> * a specification in the filesystem.md
> * contract tests to match the spec and which object stores can implement,
> as well as HDFS
>
> The change has ~no javadocs and doesn't even state
> * whether it's recursive or not.
> * whether it includes directories or not
>
> batchedListStatusIterator is exactly the kind of feature this should apply
> to -it is where we get a chance to fix those limitations of the previous
> calls (blocking sync, no expectation of right to cancel listings), ...
>
> I'd like to be able to
> * provide a hint on batch sizes.
> * get an async response so the fact the LIST can can take time is more
> visible.
> * and let us cancel that query if it is taking too long
>
> I also like to be able to close an iterator too; that is something we
> can/should retrofit, or require all implementations to add
>
>
> Completable<RemoteIterator<PartialListing<S extends FileStatus>> listing =
>   batchList(Path)
>    .recursive(true)
>    .opt("fs.option.batchlist.size", 100)
>    .build()
>
> RemoteIterator<PartialListing<FileStatus> it = listing.get()
>
> FileStatus largeFile = null;
>
> try {
>   while(it.hasNext()) {
>     FileStatus st = it.next();
>     if (st.length()> 1_000_000) {
>       largeFile = st;
>       break;
>     }
>   } finally {
>     if (it instanceof Closeable) {
>       IOUtils.closeQuietly((Closeable)it);
>     }
>   }
>
>   if (largeFile != null) {
>     processLargeFile(largeFile);
>   }
> }
>
> See: something for slower IO, controllable batch sizes and a way to cancel
> the scan -so let us recycle the HTTP connection even when breaking out
> early.
>
> This is a recurrent problem and I am getting as bored as a sending these
> emails out as people probably are at receiving them.
>
> Please please at least talk to me. Yes I'm going to add more homework but
> the goal is to make it something well documented well testable and
> straightforward to implement by other implementations without us having to
> reverse engineer HDFS's behaviour and consider that a normative
>
> What I do here?
> 1. Do I overreact and revert the change until my needs are met? Because I
> know that if I volunteered to do this work myself it's going to get
> neglected.
> 2. Is someone going to put their hand up to help this?
>
> At the very least, I'm going to tag the APIs as unstable and potentially
> likely to break so that anyone who uses it in hadoop-3.3.0 isn't going to
> be upset when it is moved to a builder API. And it will have to  for the
> objects stores.
>
> sorry
>
> steve
>