You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Genmao Yu (JIRA)" <ji...@apache.org> on 2017/11/14 11:10:00 UTC

[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

    [ https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250906#comment-16250906 ] 

Genmao Yu edited comment on HADOOP-14999 at 11/14/17 11:09 AM:
---------------------------------------------------------------

pending on refactoring: use {{SemaphoredDelegatingExecutor}} instead of {{TaskEngine}}.

Just as discussed in HADOOP-15027, I think {{org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor}} is a good common class, and we may move it to hadoop-common. [~stevel@apache.org] Do you mind if I open jira to do this work?


was (Author: unclegen):
pending on refactoring: move the {{TaskEngine}} from output stream to oss filesystem .

Just as discussed in HADOOP-15027, I think {{org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor}} is a good common class, and we may move it to hadoop-common. 
Then, I will refactor the {{TaskEngine}} to use {{SemaphoredDelegatingExecutor}}.

[~stevel@apache.org] Do you mind if I open jira to do this work?

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-14999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14999
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0-beta1
>            Reporter: Genmao Yu
>            Assignee: Genmao Yu
>         Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this mechanism splits result to multiple small blocks and upload them in parallel. Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme example, there is a task which will output 100GB or even larger, we may need to output this 100GB to local disk and then upload it. Sometimes, it is inefficient and limited to disk space.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org