You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2020/01/29 19:06:00 UTC

[jira] [Commented] (MAPREDUCE-6758) TestDFSIO should parallelize its creation of control files on setup

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026127#comment-17026127 ] 

Hadoop QA commented on MAPREDUCE-6758:
--------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 26s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  5s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 19s{color} | {color:orange} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: The patch generated 1 new + 47 unchanged - 0 fixed = 48 total (was 47) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 49s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 42s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 50s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | MAPREDUCE-6758 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12992142/MAPREDUCE-6758.001.diff |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7f89280446dd 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7f3e1e0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7729/artifact/out/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt |
| unit | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7729/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt |
|  Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7729/testReport/ |
| asflicense | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7729/artifact/out/patch-asflicense-problems.txt |
| Max. process+thread count | 1311 (vs. ulimit of 5500) |
| modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient |
| Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7729/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TestDFSIO should parallelize its creation of control files on setup
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6758
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6758
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: test
>            Reporter: Dennis Huo
>            Priority: Major
>         Attachments: MAPREDUCE-6758.001.diff
>
>
> TestDFSIO currently performs a sequential for-loop to create {{nrFiles}} control files in the {{controlDir}} which is a subdirectory of the overall {{test.build.data}} directory, which may be a non-HDFS FileSystem implementation:
> {code:java}
> private void createControlFile(FileSystem fs,
>                                 long nrBytes, // in bytes
>                                 int nrFiles
>                               ) throws IOException {
>   LOG.info("creating control file: "+nrBytes+" bytes, "+nrFiles+" files");
>   Path controlDir = getControlDir(config);
>   fs.delete(controlDir, true);
>   for(int i=0; i < nrFiles; i++) {
>     String name = getFileName(i);
>     Path controlFile = new Path(controlDir, "in_file_" + name);
>     SequenceFile.Writer writer = null;
>     try {
>       writer = SequenceFile.createWriter(fs, config, controlFile,
>                                          Text.class, LongWritable.class,
>                                          CompressionType.NONE);
>       writer.append(new Text(name), new LongWritable(nrBytes));
>     } catch(Exception e) {
>       throw new IOException(e.getLocalizedMessage());
>     } finally {
>       if (writer != null)
>         writer.close();
>       writer = null;
>     }
>   }
>   LOG.info("created control files for: "+nrFiles+" files");
> }
> {code}
> When testing in an object-store based filesystem with higher round-trip latency than HDFS (like S3 or GCS), this means job setup that might only take seconds in HDFS ends up taking minutes or even tens of minutes against the object stores if the test is using thousands of control files. In the same vein as other JIRAs in [https://issues.apache.org/jira/browse/HADOOP-11694], the control-file creation should be parallelized/multithreaded to efficiently launch large TestDFSIO jobs against FileSystem impls with high round-trip latency but which can still support high overall throughput/QPS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org