You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Thomas Demoor (JIRA)" <ji...@apache.org> on 2016/12/06 21:52:58 UTC

[jira] [Commented] (HADOOP-13695) S3A to use a thread pool for async path operations

    [ https://issues.apache.org/jira/browse/HADOOP-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726823#comment-15726823 ] 

Thomas Demoor commented on HADOOP-13695:
----------------------------------------

The netflix solution is "dirty but efficient" if you can live with the downsides.

The other solution is I think the best way to eliminate all race conditions, but would require to always do multipart and expose the multipart complete/abort calls to the committer. For a regular PUT, once sufficient bytes are transferred, the server will commit the "transaction" and return 200 OK, you cannot cancel it. By only using multipart, we could let the committer keep track of all MultipartUploadIds, and then complete with the id of the "winner" and abort the "losing speculative copies.



> S3A to use a thread pool for async path operations
> --------------------------------------------------
>
>                 Key: HADOOP-13695
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13695
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> S3A path operations are often slow due to directory scanning, mock directory create/delete, etc. Many of these can be done asynchronously
> * because deletion is eventually consistent, deleting parent dirs after an operation has returned doesn't alter the behaviour, except in the special case of : operation failure.
> * scanning for paths/parents of a file in the create operation only needs to complete before the close() operation instantiates the object, no need to block create().
> * parallelized COPY calls would permit asynchronous rename.
> We could either use the thread pool used for block writes, or somehow isolate low cost path ops (GET, DELETE) from the more expensive calls (COPY, PUT) so that a thread doing basic IO doesn't block for the duration of the long op. Maybe also use {{Semaphore.tryAcquire()}} and only start async work if there actually is an idle thread, doing it synchronously if not. Maybe it depends on the operation. path query/cleanup before/after a write is something which could be scheduled as just more futures to schedule in the block write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org