You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2019/03/01 21:24:00 UTC

[jira] [Commented] (HADOOP-16132) Support multipart download in S3AFileSystem

    [ https://issues.apache.org/jira/browse/HADOOP-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782089#comment-16782089 ] 

Hadoop QA commented on HADOOP-16132:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m  3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 17s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 6 new + 5 unchanged - 0 fixed = 11 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 39s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m  3s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-16132 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12960824/HADOOP-16132.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cf3deec70992 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cab8529 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/16008/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt |
| whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/16008/artifact/out/whitespace-eol.txt |
|  Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/16008/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 10000) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/16008/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Support multipart download in S3AFileSystem
> -------------------------------------------
>
>                 Key: HADOOP-16132
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16132
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Justin Uang
>            Priority: Major
>         Attachments: HADOOP-16132.001.patch, HADOOP-16132.002.patch, HADOOP-16132.003.patch, HADOOP-16132.004.patch, seek-logs-parquet.txt
>
>
> I noticed that I get 150MB/s when I use the AWS CLI
> {code:java}
> aws s3 cp s3://<bucket>/<key> - > /dev/null{code}
> vs 50MB/s when I use the S3AFileSystem
> {code:java}
> hadoop fs -cat s3://<bucket>/<key> > /dev/null{code}
> Looking into the AWS CLI code, it looks like the [download|https://github.com/boto/s3transfer/blob/ca0b708ea8a6a1213c6e21ca5a856e184f824334/s3transfer/download.py] logic is quite clever. It downloads the next couple parts in parallel using range requests, and then buffers them in memory in order to reorder them and expose a single contiguous stream. I translated the logic to Java and modified the S3AFileSystem to do similar things, and am able to achieve 150MB/s download speeds as well. It is mostly done but I have some things to clean up first. The PR is here: https://github.com/palantir/hadoop/pull/47/files
> It would be great to get some other eyes on it to see what we need to do to get it merged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org