You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "shameersss1 (via GitHub)" <gi...@apache.org> on 2023/01/31 05:30:16 UTC

[GitHub] [tez] shameersss1 opened a new pull request, #263: TEZ-4397: Open Tez Input splits asynchronously

shameersss1 opened a new pull request, #263:
URL: https://github.com/apache/tez/pull/263

   Tez input splits can be opened asynchronously. This will reduce the amount of time spent for s3 to prepare the connection and opening the object.
   
   The changes are taken from the original PR: https://github.com/apache/tez/pull/195 with some additional changes
   1. Disable the feature by default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1102587503


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +103,20 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";
+  public static final int TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT = 4;
+
+  /**
+   * Number of record readers to asynchronously and proactively init.
+   * Inorder for upstream apps to use this feature, The objects created in the

Review Comment:
   thanks, this makes sense, only minor typos:
   "In order", "the objects"
   
   LGTM, let me finish fixing precommit in [TEZ-4471](https://issues.apache.org/jira/browse/TEZ-4471) and then I'll let you know to restart precommit on this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] tez-yetus commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1427798528

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 38s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 29s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  compile  |   0m 31s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +0 :ok: |  spotbugs  |   1m 11s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m  8s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javac  |   0m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 12s |  tez-mapreduce: The patch generated 0 new + 32 unchanged - 1 fixed = 32 total (was 33)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 41s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 27s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 13s |  The patch does not generate ASF License warnings.  |
   |  |   |  25m 21s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/6/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/263 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
   | uname | Linux 2257340f0786 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / e236f51ef |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~22.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~22.04-b08 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/6/testReport/ |
   | Max. process+thread count | 238 (vs. ulimit of 5500) |
   | modules | C: tez-mapreduce U: tez-mapreduce |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/6/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1101123136


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +102,17 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";
+  public static final int TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT = 4;
+
+  /**
+   * Number of record readers to asynchronously and proactively init.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS = "tez.grouping.split.init.num-recordreaders";

Review Comment:
   is this the option that can lead to issues in hive currently if >1?
   if so let's use @Unstable annotation at least and describe what's the problem and what we expect from upstream apps in order to be able to leverage this feature



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] tez-yetus commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1427703749

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 38s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 40s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  compile  |   0m 30s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +0 :ok: |  spotbugs  |   1m 13s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m 12s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javac  |   0m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 10s |  tez-mapreduce: The patch generated 1 new + 32 unchanged - 1 fixed = 33 total (was 33)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 41s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 26s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 13s |  The patch does not generate ASF License warnings.  |
   |  |   |  25m 30s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/4/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/263 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
   | uname | Linux efd4de4d7c6d 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / e236f51ef |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~22.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~22.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/4/artifact/out/diff-checkstyle-tez-mapreduce.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/4/testReport/ |
   | Max. process+thread count | 236 (vs. ulimit of 5500) |
   | modules | C: tez-mapreduce U: tez-mapreduce |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/4/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] tez-yetus commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1423927591

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  0s |  Docker mode activated.  |
   | -1 :x: |  docker  |   0m  3s |  Docker failed to build yetus/tez:c386865e7.  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | GITHUB PR | https://github.com/apache/tez/pull/263 |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/3/console |
   | versions | git=2.17.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104337334


##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -129,14 +138,69 @@ public class TezGroupedSplitsRecordReader implements RecordReader<K, V> {
     int idx = 0;
     long progress;
     RecordReader<K, V> curReader;
-    
+    private final AtomicInteger initIndex;
+    private final int numReaders;
+    private ExecutorService initReaderExecService;
+    private BlockingDeque<Future<RecordReader<K, V>>> initedReaders;
+    private AtomicBoolean failureOccurred = new AtomicBoolean(false);
+
     public TezGroupedSplitsRecordReader(TezGroupedSplit split, JobConf job,
         Reporter reporter) throws IOException {
       this.groupedSplit = split;
       this.job = job;
       this.reporter = reporter;
+      this.initIndex = new AtomicInteger(0);
+      int numThreads = conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS,
+          TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT);
+      this.numReaders = Math.min(groupedSplit.wrappedSplits.size(),
+          conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS,
+              TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS_DEFAULT));
+      // skip multi-threaded split opening when number of readers is less than 1

Review Comment:
   if you skip ONLY when numReaders is less than, then you actually do:
   ```
   if (numReaders >= 1){
   }
   ...skip...
   ```
   
   but the intention is:
   ```
   if (numReaders > 1){
   }
   ...skip...
   ```
   so skip if numReaders is less than or equal to 1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] tez-yetus commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1409839186

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |  36m 17s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 23s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   0m 28s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 57s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +0 :ok: |  spotbugs  |   1m  7s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m  5s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 19s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 19s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   0m 19s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 18s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m 11s |  tez-mapreduce: The patch generated 8 new + 32 unchanged - 1 fixed = 40 total (was 33)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 46s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 21s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 13s |  The patch does not generate ASF License warnings.  |
   |  |   |  60m 24s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/263 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
   | uname | Linux 2f240c408b7c 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 17:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 39e5a8e71 |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/1/artifact/out/diff-checkstyle-tez-mapreduce.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/1/testReport/ |
   | Max. process+thread count | 218 (vs. ulimit of 5500) |
   | modules | C: tez-mapreduce U: tez-mapreduce |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/1/console |
   | versions | git=2.25.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104337334


##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -129,14 +138,69 @@ public class TezGroupedSplitsRecordReader implements RecordReader<K, V> {
     int idx = 0;
     long progress;
     RecordReader<K, V> curReader;
-    
+    private final AtomicInteger initIndex;
+    private final int numReaders;
+    private ExecutorService initReaderExecService;
+    private BlockingDeque<Future<RecordReader<K, V>>> initedReaders;
+    private AtomicBoolean failureOccurred = new AtomicBoolean(false);
+
     public TezGroupedSplitsRecordReader(TezGroupedSplit split, JobConf job,
         Reporter reporter) throws IOException {
       this.groupedSplit = split;
       this.job = job;
       this.reporter = reporter;
+      this.initIndex = new AtomicInteger(0);
+      int numThreads = conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS,
+          TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT);
+      this.numReaders = Math.min(groupedSplit.wrappedSplits.size(),
+          conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS,
+              TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS_DEFAULT));
+      // skip multi-threaded split opening when number of readers is less than 1

Review Comment:
   if you skip ONLY when numReaders is less than, then you do:
   ```
   if (numReaders >= 1){
   }
   ...skip...
   ```
   
   but the intention is:
   ```
   if (numReaders > 1){
   }
   ...skip...
   ```
   so skip if numReaders is less than or equal to 1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1101220308


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +102,17 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";
+  public static final int TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT = 4;
+
+  /**
+   * Number of record readers to asynchronously and proactively init.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS = "tez.grouping.split.init.num-recordreaders";

Review Comment:
   Ack.
   Yes, > 1 will lead to issues in Hive Side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104239110


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +103,20 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";
+  public static final int TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT = 4;
+
+  /**
+   * Number of record readers to asynchronously and proactively init.
+   * Inorder for upstream apps to use this feature, The objects created in the

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104266767


##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -129,14 +138,69 @@ public class TezGroupedSplitsRecordReader implements RecordReader<K, V> {
     int idx = 0;
     long progress;
     RecordReader<K, V> curReader;
-    
+    private final AtomicInteger initIndex;
+    private final int numReaders;
+    private ExecutorService initReaderExecService;
+    private BlockingDeque<Future<RecordReader<K, V>>> initedReaders;
+    private AtomicBoolean failureOccurred = new AtomicBoolean(false);
+
     public TezGroupedSplitsRecordReader(TezGroupedSplit split, JobConf job,
         Reporter reporter) throws IOException {
       this.groupedSplit = split;
       this.job = job;
       this.reporter = reporter;
+      this.initIndex = new AtomicInteger(0);
+      int numThreads = conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS,
+          TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT);
+      this.numReaders = Math.min(groupedSplit.wrappedSplits.size(),
+          conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS,
+              TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS_DEFAULT));
+      // skip multi-threaded split opening when number of readers is less than 1

Review Comment:
   sorry, I just found this: "skip ... less than 1" is not 100% true, numReaders=1 will also lead to skipping, this is important here, so only init the executor service if numReaders are greater than 1



##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -183,23 +247,45 @@ protected boolean initNextRecordReader() throws IOException {
 
       // if all chunks have been processed, nothing more to do.
       if (idx == groupedSplit.wrappedSplits.size()) {
+        if (initReaderExecService != null) {
+          LOG.info("Shutting down the init record reader threadpool");
+          initReaderExecService.shutdownNow();
+        }
         return false;
       }
 
       if (LOG.isDebugEnabled()) {
-        LOG.debug("Init record reader for index " + idx + " of " + 
+        LOG.debug("Init record reader for index " + idx + " of " +
                   groupedSplit.wrappedSplits.size());
       }
 
       // get a record reader for the idx-th chunk
       try {
-        curReader = wrappedInputFormat.getRecordReader(
-            groupedSplit.wrappedSplits.get(idx), job, reporter);
+        // get the cur reader directly when async split opening is disabled
+        if (initReaderExecService == null) {
+          curReader = wrappedInputFormat.getRecordReader(groupedSplit.wrappedSplits.get(idx), job, reporter);
+        } else {
+          preInitReaders();
+          curReader = initedReaders.take().get();
+        }
       } catch (Exception e) {
-        throw new RuntimeException (e);
+        failureOccurred.set(true);
+        if (e instanceof InterruptedException) {
+          Thread.currentThread().interrupt();
+        }
+        if (initedReaders != null) {
+          cancelsFutures();
+        }
+        throw new RuntimeException(e);
       }
       idx++;
-      return true;
+      return curReader != null;
+    }
+
+    private void cancelsFutures() {

Review Comment:
   nit: we don't use third person in method names that reflects an action, so instead "cancelFutures"



##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +103,20 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";

Review Comment:
   this option should be in line with the other, both of them are about to configure the number of something, so:
   ```
   tez.grouping.split.init.threads
   tez.grouping.split.init.recordreaders
   ```
   or:
   ```
   tez.grouping.split.init.num-threads
   tez.grouping.split.init.num-recordreaders
   ```
   
   I prefer the first, we don't really use "num" in TezConfiguration either (only at 1 place), it's redundant I believe



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1102587503


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +103,20 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";
+  public static final int TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT = 4;
+
+  /**
+   * Number of record readers to asynchronously and proactively init.
+   * Inorder for upstream apps to use this feature, The objects created in the

Review Comment:
   thanks, this makes sense, only minor typos, nit:
   "In order", "the objects"
   
   LGTM, let me finish fixing precommit in [TEZ-4471](https://issues.apache.org/jira/browse/TEZ-4471) and then I'll let you know to restart precommit on this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1409783144

   @abstractdog Could you please review the same?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104318380


##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -129,14 +138,69 @@ public class TezGroupedSplitsRecordReader implements RecordReader<K, V> {
     int idx = 0;
     long progress;
     RecordReader<K, V> curReader;
-    
+    private final AtomicInteger initIndex;
+    private final int numReaders;
+    private ExecutorService initReaderExecService;
+    private BlockingDeque<Future<RecordReader<K, V>>> initedReaders;
+    private AtomicBoolean failureOccurred = new AtomicBoolean(false);
+
     public TezGroupedSplitsRecordReader(TezGroupedSplit split, JobConf job,
         Reporter reporter) throws IOException {
       this.groupedSplit = split;
       this.job = job;
       this.reporter = reporter;
+      this.initIndex = new AtomicInteger(0);
+      int numThreads = conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS,
+          TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT);
+      this.numReaders = Math.min(groupedSplit.wrappedSplits.size(),
+          conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS,
+              TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS_DEFAULT));
+      // skip multi-threaded split opening when number of readers is less than 1

Review Comment:
   "when number of readers is less than 1" conveys the same message right? Am i missing anything?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] tez-yetus commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1427764957

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |  25m 11s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 24s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  compile  |   0m 26s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +0 :ok: |  spotbugs  |   1m  5s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m  4s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 10s |  tez-mapreduce: The patch generated 0 new + 32 unchanged - 1 fixed = 32 total (was 33)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~22.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 45s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 22s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 12s |  The patch does not generate ASF License warnings.  |
   |  |   |  49m 10s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/263 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
   | uname | Linux 7fc5084ebfff 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 17:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / e236f51ef |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~22.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu222.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~22.04-b08 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/5/testReport/ |
   | Max. process+thread count | 220 (vs. ulimit of 5500) |
   | modules | C: tez-mapreduce U: tez-mapreduce |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/5/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] abstractdog merged pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog merged PR #263:
URL: https://github.com/apache/tez/pull/263


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104318922


##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -183,23 +247,45 @@ protected boolean initNextRecordReader() throws IOException {
 
       // if all chunks have been processed, nothing more to do.
       if (idx == groupedSplit.wrappedSplits.size()) {
+        if (initReaderExecService != null) {
+          LOG.info("Shutting down the init record reader threadpool");
+          initReaderExecService.shutdownNow();
+        }
         return false;
       }
 
       if (LOG.isDebugEnabled()) {
-        LOG.debug("Init record reader for index " + idx + " of " + 
+        LOG.debug("Init record reader for index " + idx + " of " +
                   groupedSplit.wrappedSplits.size());
       }
 
       // get a record reader for the idx-th chunk
       try {
-        curReader = wrappedInputFormat.getRecordReader(
-            groupedSplit.wrappedSplits.get(idx), job, reporter);
+        // get the cur reader directly when async split opening is disabled
+        if (initReaderExecService == null) {
+          curReader = wrappedInputFormat.getRecordReader(groupedSplit.wrappedSplits.get(idx), job, reporter);
+        } else {
+          preInitReaders();
+          curReader = initedReaders.take().get();
+        }
       } catch (Exception e) {
-        throw new RuntimeException (e);
+        failureOccurred.set(true);
+        if (e instanceof InterruptedException) {
+          Thread.currentThread().interrupt();
+        }
+        if (initedReaders != null) {
+          cancelsFutures();
+        }
+        throw new RuntimeException(e);
       }
       idx++;
-      return true;
+      return curReader != null;
+    }
+
+    private void cancelsFutures() {

Review Comment:
   Ack!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104319455


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/grouper/TezSplitGrouper.java:
##########
@@ -102,6 +103,20 @@ public abstract class TezSplitGrouper {
   public static final String TEZ_GROUPING_NODE_LOCAL_ONLY = "tez.grouping.node.local.only";
   public static final boolean TEZ_GROUPING_NODE_LOCAL_ONLY_DEFAULT = false;
 
+  /**
+   * Number of threads used to initialize the grouped splits, to asynchronously open the readers.
+   */
+  public static final String TEZ_GROUPING_SPLIT_INIT_THREADS = "tez.grouping.split.init-threads";

Review Comment:
   Ack!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] shameersss1 commented on a diff in pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "shameersss1 (via GitHub)" <gi...@apache.org>.
shameersss1 commented on code in PR #263:
URL: https://github.com/apache/tez/pull/263#discussion_r1104357781


##########
tez-mapreduce/src/main/java/org/apache/hadoop/mapred/split/TezGroupedSplitsInputFormat.java:
##########
@@ -129,14 +138,69 @@ public class TezGroupedSplitsRecordReader implements RecordReader<K, V> {
     int idx = 0;
     long progress;
     RecordReader<K, V> curReader;
-    
+    private final AtomicInteger initIndex;
+    private final int numReaders;
+    private ExecutorService initReaderExecService;
+    private BlockingDeque<Future<RecordReader<K, V>>> initedReaders;
+    private AtomicBoolean failureOccurred = new AtomicBoolean(false);
+
     public TezGroupedSplitsRecordReader(TezGroupedSplit split, JobConf job,
         Reporter reporter) throws IOException {
       this.groupedSplit = split;
       this.job = job;
       this.reporter = reporter;
+      this.initIndex = new AtomicInteger(0);
+      int numThreads = conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS,
+          TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_THREADS_DEFAULT);
+      this.numReaders = Math.min(groupedSplit.wrappedSplits.size(),
+          conf.getInt(TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS,
+              TezSplitGrouper.TEZ_GROUPING_SPLIT_INIT_NUM_RECORDREADERS_DEFAULT));
+      // skip multi-threaded split opening when number of readers is less than 1

Review Comment:
   ack. Reworded to "  // init the async split opening executor service if numReaders are greater than 1"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tez] tez-yetus commented on pull request #263: TEZ-4397: Open Tez Input splits asynchronously

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #263:
URL: https://github.com/apache/tez/pull/263#issuecomment-1409868260

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 59s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 27s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   0m 27s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  master passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  master passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +0 :ok: |  spotbugs  |   1m  6s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   1m  5s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 18s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 11s |  tez-mapreduce: The patch generated 0 new + 32 unchanged - 1 fixed = 32 total (was 33)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  findbugs  |   0m 44s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 24s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 12s |  The patch does not generate ASF License warnings.  |
   |  |   |  25m  5s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/263 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
   | uname | Linux 795d103543df 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 17:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 39e5a8e71 |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/2/testReport/ |
   | Max. process+thread count | 219 (vs. ulimit of 5500) |
   | modules | C: tez-mapreduce U: tez-mapreduce |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-263/2/console |
   | versions | git=2.25.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org