You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2016/03/11 00:48:40 UTC

[jira] [Comment Edited] (HADOOP-12910) Add new FileSystem API to support asynchronous method calls

    [ https://issues.apache.org/jira/browse/HADOOP-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189621#comment-15189621 ] 

Chris Nauroth edited comment on HADOOP-12910 at 3/10/16 11:48 PM:
------------------------------------------------------------------

I am sensing massive scope creep in this discussion.

bq. Actually, one more thing to define in HDFS-9924 and include any specification is: linearlizability/serializability guarantees

I'm going to repeat some of my comments from HDFS-9924.  A big motivation for this effort is that we often see an application needs to execute a large set of renames, where the application has knowledge that there is no dependency between the rename operations and no ordering requirements.  Although linearizability is certainly nicer to have than not have, use cases like this don't need linearizability.

Implementing a linearizability guarantee would significantly complicate this effort.  ZooKeeper has an async API with ordering guarantees, and it takes a very delicate coordination between client-side and server-side state to make that happen.  Instead, I suggest that we focus on what we really need (async execution of independent operations) and tell clients that they have responsibility to coordinate dependencies between calls.  I also have commented on HDFS-9924 that we could later provide a programming model of futures + promises as a more elegant way to help callers structure code with multiple dependent async calls.  Even that much is not an immediate need though.

This does not preclude providing a linearizability guarantee at some point in the future.  I'm just saying that we have an opportunity to provide something valuable sooner even without linearizability.

bq. I'm going to be ruthless and say "I'd like to see a specification of this alongside the existing one". Because that one has succeeded in being a reference point for everyone; we need to continue that for a key binding. It should be straightforward here.

Assuming the above project plan is acceptable (no linearizability right now), this reduces to a simple statement like "individual async operations adhere to the same contract as the corresponding sync operations, and there are no guarantees on ordering across multiple async operations."

bq. Is it the future that raises an IOE, or the operation? I can see both needing to

Certainly Hadoop-specific exceptions like {{AccessControlException}} and {{QuotaExceededException}} must dispatch asynchronously, such as wrapped in an {{ExecutionException}}.  You won't know if you're going to hit one of these at the time of submitting the call.  My opinion is that if the API is truly async, then it implies we cannot perform I/O on the calling thread, and therefore cannot throw an {{IOException}} at call time.  I believe Nicholas wants to put {{throws IOException}} in the method signatures anyway for ease of backwards-compatible changes in the future though, just in case we find a need later.  I think that's acceptable.



was (Author: cnauroth):
I am sensing massive scope creep in this discussion.

bq. Actually, one more thing to define in HDFS-9924 and include any specification is: linearlizability/serializability guarantees

I'm going to repeat some of my comments from HDFS-9924.  A big motivation for this effort is that we often see an application needs to execute a large set of renames, where the application has knowledge that there is no dependency between the rename operations and no ordering requirements.  Although linearizability is certainly nicer to have than not have, use cases like this don't need linearizability.

Implementing a linearizability guarantee would significantly complicate this effort.  ZooKeeper has an async API with ordering guarantees, and it takes a very delicate coordination between client-side and server-side state to make that happen.  Instead, I suggest that we focus on what we really need (async execution of independent operations) and tell clients that they have responsibility to coordinate dependencies between calls.  I also have commented on HDFS-9924 that we could later providing a programming model of futures + promises as a more elegant way to help callers structure code with multiple dependent async calls.  Even that much is not an immediate need though.

This does not preclude providing a linearizability guarantee at some point in the future.  I'm just saying that we have an opportunity to provide something valuable sooner even without linearizability.

bq. I'm going to be ruthless and say "I'd like to see a specification of this alongside the existing one". Because that one has succeeded in being a reference point for everyone; we need to continue that for a key binding. It should be straightforward here.

Assuming the above project plan is acceptable (no linearizability right now), this reduces to a simple statement like "individual async operations adhere to the same contract as the corresponding sync operations, and there are no guarantees on ordering across multiple async operations."

bq. Is it the future that raises an IOE, or the operation? I can see both needing to

Certainly Hadoop-specific exceptions like {{AccessControlException}} and {{QuotaExceededException}} must dispatch asynchronously, such as wrapped in an {{ExecutionException}}.  You won't know if you're going to hit one of these at the time of submitting the call.  My opinion is that if the API is truly async, then it implies we cannot perform I/O on the calling thread, and therefore cannot throw an {{IOException}} at call time.  I believe Nicholas wants to put {{throws IOException}} in the method signatures anyway for ease of backwards-compatible changes in the future though, just in case we find a need later.  I think that's acceptable.


> Add new FileSystem API to support asynchronous method calls
> -----------------------------------------------------------
>
>                 Key: HADOOP-12910
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12910
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>
> Add a new API, namely FutureFileSystem (or AsynchronousFileSystem, if it is a better name).  All the APIs in FutureFileSystem are the same as FileSystem except that the return type is wrapped by Future, e.g.
> {code}
>   //FileSystem
>   public boolean rename(Path src, Path dst) throws IOException;
>   //FutureFileSystem
>   public Future<Boolean> rename(Path src, Path dst) throws IOException;
> {code}
> Note that FutureFileSystem does not extend FileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)