You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "abstractdog (via GitHub)" <gi...@apache.org> on 2024/03/30 08:37:35 UTC

[PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

abstractdog opened a new pull request, #341:
URL: https://github.com/apache/tez/pull/341

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2030729876

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  9s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 5 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   6m  2s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   9m 16s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 17s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 28s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 50s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m  1s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 50s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 50s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 49s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 49s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  8s |  The patch passed checkstyle in tez-api  |
   | +1 :green_heart: |  checkstyle  |   0m  5s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m  8s |  tez-mapreduce: The patch generated 0 new + 43 unchanged - 2 fixed = 43 total (was 45)  |
   | +1 :green_heart: |  checkstyle  |   0m 10s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 13s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 57s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 29s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m  0s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 13s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  The patch does not generate ASF License warnings.  |
   |  |   |  41m 17s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/6/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux a58b56a303df 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/6/testReport/ |
   | Max. process+thread count | 694 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/6/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2032027449

   > 🎊 **+1 overall**
   > 
   > Vote	Subsystem	Runtime	Comment
   > +0 🆗	reexec	0m 9s	Docker mode activated.
   > _ Prechecks _	
   > +1 💚	dupname	0m 0s	No case conflicting files found.
   > +0 🆗	prototool	0m 0s	prototool was not available.
   > +1 💚	@author	0m 0s	The patch does not contain any @author tags.
   > +1 💚	test4tests	0m 0s	The patch appears to include 5 new or modified test files.
   > _ master Compile Tests _	
   > +0 🆗	mvndep	5m 55s	Maven dependency ordering for branch
   > +1 💚	mvninstall	8m 2s	master passed
   > +1 💚	compile	1m 13s	master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
   > +1 💚	compile	1m 24s	master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
   > +1 💚	checkstyle	1m 25s	master passed
   > +1 💚	javadoc	1m 19s	master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
   > +1 💚	javadoc	1m 10s	master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
   > +0 🆗	spotbugs	0m 42s	Used deprecated FindBugs config; considering switching to SpotBugs.
   > +1 💚	findbugs	2m 59s	master passed
   > _ Patch Compile Tests _	
   > +0 🆗	mvndep	0m 7s	Maven dependency ordering for patch
   > +1 💚	mvninstall	0m 53s	the patch passed
   > +1 💚	compile	0m 52s	the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
   > +1 💚	cc	0m 52s	the patch passed
   > +1 💚	javac	0m 52s	the patch passed
   > +1 💚	compile	0m 53s	the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
   > +1 💚	cc	0m 53s	the patch passed
   > +1 💚	javac	0m 53s	the patch passed
   > -0 ⚠️	checkstyle	0m 8s	tez-mapreduce: The patch generated 4 new + 43 unchanged - 2 fixed = 47 total (was 45)
   > +1 💚	whitespace	0m 0s	The patch has no whitespace issues.
   > +1 💚	javadoc	0m 37s	the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
   > +1 💚	javadoc	0m 40s	the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
   > +1 💚	findbugs	2m 18s	the patch passed
   > _ Other Tests _	
   > +1 💚	unit	1m 57s	tez-api in the patch passed.
   > +1 💚	unit	0m 30s	tez-runtime-internals in the patch passed.
   > +1 💚	unit	0m 59s	tez-mapreduce in the patch passed.
   > +1 💚	unit	4m 46s	tez-dag in the patch passed.
   > +1 💚	asflicense	0m 33s	The patch does not generate ASF License warnings.
   > 40m 36s	
   > Subsystem	Report/Notes
   > Docker	ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/artifact/out/Dockerfile
   > GITHUB PR	#341
   > JIRA Issue	[TEZ-4548](https://issues.apache.org/jira/browse/TEZ-4548)
   > Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool
   > uname	Linux c4a861b17fef 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
   > Build tool	maven
   > Personality	personality/tez.sh
   > git revision	master / [34bb628](https://github.com/apache/tez/commit/34bb628e3a25edf5e36e3ca1bf73f06752a43118)
   > Default Java	Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
   > Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
   > checkstyle	https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/artifact/out/diff-checkstyle-tez-mapreduce.txt
   > Test Results	https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/testReport/
   > Max. process+thread count	573 (vs. ulimit of 5500)
   > modules	C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: .
   > Console output	https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/console
   > versions	git=2.34.1 maven=3.6.3 findbugs=3.0.1
   > Powered by	Apache Yetus 0.12.0 https://yetus.apache.org
   > This message was automatically generated.
   
   LOL, right, I haven't checked it in detail, I thought it was the another checkstyle warning which is usually a noise :) fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2031972279

   > Thanx @abstractdog for the update. There are some checkstyle warnings can you take care of them, rest things LGTM
   
   thanks!
   I addressed what I could have (renaming some fields), I cannot really address [HiddenField], I consider it as a noise now
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2027990447

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |  12m 14s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 4 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 48s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 36s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 23s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 32s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 48s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m  2s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 48s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 48s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m  8s |  tez-api: The patch generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2)  |
   | -0 :warning: |  checkstyle  |   0m  8s |  tez-mapreduce: The patch generated 3 new + 43 unchanged - 1 fixed = 46 total (was 44)  |
   | -0 :warning: |  checkstyle  |   0m 13s |  tez-dag: The patch generated 1 new + 6 unchanged - 0 fixed = 7 total (was 6)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | -1 :x: |  findbugs  |   0m 29s |  tez-runtime-internals generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 56s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 28s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 59s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 13s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  The patch does not generate ASF License warnings.  |
   |  |   |  51m 53s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:tez-runtime-internals |
   |  |  Found reliance on default encoding in org.apache.tez.common.ProtoConverters.convertRootInputDataInformationEventToProto(InputDataInformationEvent):in org.apache.tez.common.ProtoConverters.convertRootInputDataInformationEventToProto(InputDataInformationEvent): String.getBytes()  At ProtoConverters.java:[line 141] |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 6e16603385e4 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/artifact/out/diff-checkstyle-tez-api.txt |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/artifact/out/diff-checkstyle-tez-mapreduce.txt |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/artifact/out/diff-checkstyle-tez-dag.txt |
   | findbugs | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/artifact/out/new-findbugs-tez-runtime-internals.html |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/testReport/ |
   | Max. process+thread count | 613 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/1/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "aturoczy (via GitHub)" <gi...@apache.org>.
aturoczy commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546906996


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -103,6 +113,7 @@ public Object getDeserializedUserPayload() {
   public String toString() {
     return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
         + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
-        + ", deserializedUserPayloadExists=" + (userPayloadObject != null) + "]";
-  } 
+        + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
+        + ", serializedPath=" + serializedPath + "]";

Review Comment:
   If serializedPath is null then it would look a bit odd. I like to handle this type of scenario. But this is just styling :) 



##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -79,6 +79,12 @@ public static InputDataInformationEvent createWithObjectPayload(int srcIndex,
     return new InputDataInformationEvent(srcIndex, userPayloadDeserialized, null);
   }
 
+  public static InputDataInformationEvent createWithSerializedPath(int srcIndex, String serializedPath) {
+    InputDataInformationEvent event = new InputDataInformationEvent(srcIndex, null);
+    event.serializedPath = serializedPath;

Review Comment:
   Won't just be easier to add a 3rd constructor for the InputDataInformationEvent? To set a property is fine, just not fail safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2031214469

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 11s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 5 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 49s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   8m 40s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 22s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 20s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 14s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 44s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 45s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 51s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 47s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 47s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m  6s |  tez-api: The patch generated 1 new + 17 unchanged - 0 fixed = 18 total (was 17)  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  tez-mapreduce: The patch generated 0 new + 43 unchanged - 2 fixed = 43 total (was 45)  |
   | +1 :green_heart: |  checkstyle  |   0m 12s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | -1 :x: |  findbugs  |   0m 38s |  tez-api generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 52s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 26s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 56s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   3m 57s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m 18s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:tez-api |
   |  |  Redundant nullcheck of StringBuilder.toString(), which is known to be non-null in org.apache.tez.runtime.api.events.InputDataInformationEvent.toString()  Redundant null check at InputDataInformationEvent.java:is known to be non-null in org.apache.tez.runtime.api.events.InputDataInformationEvent.toString()  Redundant null check at InputDataInformationEvent.java:[line 114] |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/7/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux d30fa7dc7fac 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/7/artifact/out/diff-checkstyle-tez-api.txt |
   | findbugs | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/7/artifact/out/new-findbugs-tez-api.html |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/7/testReport/ |
   | Max. process+thread count | 278 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/7/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547385546


##########
tez-runtime-internals/src/main/java/org/apache/tez/common/ProtoConverters.java:
##########
@@ -135,15 +138,22 @@ public static VertexManagerEvent convertVertexManagerEventFromProto(
     if (event.getUserPayload() != null) {
       builder.setUserPayload(ByteString.copyFrom(event.getUserPayload()));
     }
+    if (event.getSerializedPath() != null) {
+      builder.setSerializedPath(ByteString.copyFrom(event.getSerializedPath().getBytes(Charsets.UTF_8)));
+    }
     return builder.build();
   }
 
-  public static InputDataInformationEvent
-      convertRootInputDataInformationEventFromProto(
+  public static InputDataInformationEvent convertRootInputDataInformationEventFromProto(
       EventProtos.RootInputDataInformationEventProto proto) {
-    InputDataInformationEvent diEvent = InputDataInformationEvent.createWithSerializedPayload(
-        proto.getSourceIndex(),
-        proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null);
+    ByteBuffer payload = proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null;
+    InputDataInformationEvent diEvent = null;
+    if (!proto.getSerializedPath().isEmpty()) {

Review Comment:
   ack, checked and proto.hasSerializedPath() works, will fix
   maybe this isEmpty craziness was the outcome of an early incorrect implementation of convertRootInputDataInformationEventToProto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2028026524

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  8s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  1s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 4 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 12s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 19s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m  9s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 48s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 58s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  8s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  6s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  6s |  The patch passed checkstyle in tez-api  |
   | +1 :green_heart: |  checkstyle  |   0m  6s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  tez-mapreduce: The patch generated 0 new + 42 unchanged - 2 fixed = 42 total (was 44)  |
   | +1 :green_heart: |  checkstyle  |   0m 12s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 40s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m  2s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 31s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m  3s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 21s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m  7s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/3/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 266432ac50f2 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/3/testReport/ |
   | Max. process+thread count | 733 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/3/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.
ayushtkn commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547372195


##########
tez-runtime-internals/src/main/java/org/apache/tez/common/ProtoConverters.java:
##########
@@ -135,15 +138,22 @@ public static VertexManagerEvent convertVertexManagerEventFromProto(
     if (event.getUserPayload() != null) {
       builder.setUserPayload(ByteString.copyFrom(event.getUserPayload()));
     }
+    if (event.getSerializedPath() != null) {
+      builder.setSerializedPath(ByteString.copyFrom(event.getSerializedPath().getBytes(Charsets.UTF_8)));
+    }
     return builder.build();
   }
 
-  public static InputDataInformationEvent
-      convertRootInputDataInformationEventFromProto(
+  public static InputDataInformationEvent convertRootInputDataInformationEventFromProto(
       EventProtos.RootInputDataInformationEventProto proto) {
-    InputDataInformationEvent diEvent = InputDataInformationEvent.createWithSerializedPayload(
-        proto.getSourceIndex(),
-        proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null);
+    ByteBuffer payload = proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null;
+    InputDataInformationEvent diEvent = null;
+    if (!proto.getSerializedPath().isEmpty()) {

Review Comment:
   should be ok if it can never reach & there is no public method which can reach here, else maybe we can have Preconditions or Assert to see we don't land up having two source of truth accidentally even in future, but I am ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2032062863

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 10s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 5 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 54s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 40s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 14s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 43s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 39s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  8s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 57s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 59s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 59s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 56s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 55s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m  9s |  tez-mapreduce: The patch generated 2 new + 43 unchanged - 2 fixed = 45 total (was 45)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 31s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  |   1m 57s |  tez-api in the patch failed.  |
   | +1 :green_heart: |  unit  |   0m 32s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m  4s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 16s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m 24s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | tez.client.TestTezClient |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/9/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux c1088b3d9952 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/9/artifact/out/diff-checkstyle-tez-mapreduce.txt |
   | unit | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/9/artifact/out/patch-unit-tez-api.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/9/testReport/ |
   | Max. process+thread count | 691 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/9/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2032248345

   tests passed green
   considering @ayushtkn 's LGTM an approval, will merge this soon in case of no further objections


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546922261


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -79,6 +79,12 @@ public static InputDataInformationEvent createWithObjectPayload(int srcIndex,
     return new InputDataInformationEvent(srcIndex, userPayloadDeserialized, null);
   }
 
+  public static InputDataInformationEvent createWithSerializedPath(int srcIndex, String serializedPath) {
+    InputDataInformationEvent event = new InputDataInformationEvent(srcIndex, null);
+    event.serializedPath = serializedPath;

Review Comment:
   this method makes it impossible for users to call constructors in an invalid way (like: adding payload and serialized path at the same time, which should not happen), also with having a method name that implies this behavior
   what constructor are you suggesting that has the same advantages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546922261


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -79,6 +79,12 @@ public static InputDataInformationEvent createWithObjectPayload(int srcIndex,
     return new InputDataInformationEvent(srcIndex, userPayloadDeserialized, null);
   }
 
+  public static InputDataInformationEvent createWithSerializedPath(int srcIndex, String serializedPath) {
+    InputDataInformationEvent event = new InputDataInformationEvent(srcIndex, null);
+    event.serializedPath = serializedPath;

Review Comment:
   this method makes it impossible for users to call constructors in an invalid way (like: adding payload and serialized path at the same time, which should not happen), also this has a method name that implies the expected behavior
   what constructor are you suggesting that has the same advantages as createWithSerializedPath?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547193756


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -103,6 +113,7 @@ public Object getDeserializedUserPayload() {
   public String toString() {
     return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
         + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
-        + ", deserializedUserPayloadExists=" + (userPayloadObject != null) + "]";
-  } 
+        + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
+        + ", serializedPath=" + serializedPath + "]";

Review Comment:
   ack, I fixed this in https://github.com/apache/tez/pull/341/commits/40694b77b1ff7d49e25ed7ed8abda4f0d6195367



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547349710


##########
tez-runtime-internals/src/main/java/org/apache/tez/common/ProtoConverters.java:
##########
@@ -135,15 +138,22 @@ public static VertexManagerEvent convertVertexManagerEventFromProto(
     if (event.getUserPayload() != null) {
       builder.setUserPayload(ByteString.copyFrom(event.getUserPayload()));
     }
+    if (event.getSerializedPath() != null) {
+      builder.setSerializedPath(ByteString.copyFrom(event.getSerializedPath().getBytes(Charsets.UTF_8)));
+    }
     return builder.build();
   }
 
-  public static InputDataInformationEvent
-      convertRootInputDataInformationEventFromProto(
+  public static InputDataInformationEvent convertRootInputDataInformationEventFromProto(
       EventProtos.RootInputDataInformationEventProto proto) {
-    InputDataInformationEvent diEvent = InputDataInformationEvent.createWithSerializedPayload(
-        proto.getSourceIndex(),
-        proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null);
+    ByteBuffer payload = proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null;
+    InputDataInformationEvent diEvent = null;
+    if (!proto.getSerializedPath().isEmpty()) {

Review Comment:
   idea was: if there is serializedPath, we just use it and don't care about payload, we should have one or another, not both
   I think the static InputDataInformationEvent creator methods take care of not having both
   is it fine?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547357268


##########
tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
##########
@@ -56,9 +62,12 @@ public class TestMRInputHelpers {
   private static Path oldSplitsDir;
   private static Path newSplitsDir;
 
-  private static String TEST_ROOT_DIR = "target"
+  private static final String TEST_ROOT_DIR = "target"
       + Path.SEPARATOR + TestMRHelpers.class.getName() + "-tmpDir";
 
+  private static final Path LOCAL_TEST_ROOT_DIR = new Path("target"
+      + Path.SEPARATOR + TestMRHelpers.class.getName() + "-localtmpDir");
+

Review Comment:
   ack, will do
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2028026525

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  9s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 4 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 47s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 25s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 10s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 46s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 59s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  7s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  The patch passed checkstyle in tez-api  |
   | +1 :green_heart: |  checkstyle  |   0m  6s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  tez-mapreduce: The patch generated 0 new + 42 unchanged - 2 fixed = 42 total (was 44)  |
   | +1 :green_heart: |  checkstyle  |   0m 13s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 38s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 59s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 30s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m  4s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 21s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m 46s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 8eba468c023a 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/2/testReport/ |
   | Max. process+thread count | 322 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/2/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2028888531

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  9s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 4 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 58s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 42s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 18s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 34s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 44s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 52s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 47s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 47s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  The patch passed checkstyle in tez-api  |
   | +1 :green_heart: |  checkstyle  |   0m  6s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m  8s |  tez-mapreduce: The patch generated 0 new + 42 unchanged - 2 fixed = 42 total (was 44)  |
   | +1 :green_heart: |  checkstyle  |   0m 11s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 14s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 58s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 30s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 59s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 16s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m 46s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 2e13a876daba 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/5/testReport/ |
   | Max. process+thread count | 588 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/5/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547308955


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);

Review Comment:
   ack, let me do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2032236638

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |  26m 35s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 5 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   6m 13s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  12m 44s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 21s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   2m  6s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   2m  9s |  master passed  |
   | +1 :green_heart: |  javadoc  |   2m  2s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   1m 18s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   4m 26s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 10s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 33s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   1m 33s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 33s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   1m 23s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 23s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 12s |  The patch passed checkstyle in tez-api  |
   | +1 :green_heart: |  checkstyle  |   0m 10s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m 11s |  tez-mapreduce: The patch generated 0 new + 43 unchanged - 2 fixed = 43 total (was 45)  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   3m 45s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 17s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 37s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 22s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 56s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 40s |  The patch does not generate ASF License warnings.  |
   |  |   |  83m 13s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/11/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 14c63bf90eaa 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/11/testReport/ |
   | Max. process+thread count | 490 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/11/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2032030599

   > > I cannot address [HiddenField], I consider it as noise now
   > 
   > @abstractdog that HiddenField should be fixable, it is telling you have localFs already defined above & you are redefining the same variable name in the test method, rather than using the class variable
   > 
   > ```
   > diff --git a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
   > index c97158d14..a7501e8ae 100644
   > --- a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
   > +++ b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
   > @@ -217,7 +217,6 @@ public class TestMRInputHelpers {
   >    public void testInputEventSerializedPath() throws IOException {
   >      MRSplitProto proto = MRSplitProto.newBuilder().setSplitBytes(ByteString.copyFrom("splits".getBytes())).build();
   >  
   > -    FileSystem localFs = FileSystem.getLocal(conf);
   >      Path splitsDir = localFs.resolvePath(localTestRootDir);
   >  
   >      Path serializedPath = new Path(splitsDir + Path.SEPARATOR + "splitpayload");
   > @@ -282,7 +281,6 @@ public class TestMRInputHelpers {
   >  
   >    @Test(timeout = 5000)
   >    public void testInputSplitLocalResourceCreationWithDifferentFS() throws Exception {
   > -    FileSystem localFs = FileSystem.getLocal(conf);
   >      Path splitsDir = localFs.resolvePath(localTestRootDir);
   >  
   >      DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);
   > ```
   > 
   > Just remove that line `FileSystem localFs = FileSystem.getLocal(conf);` line and it should work
   
   LOL, right, I haven't checked it in detail, I thought it was the another checkstyle warning which is usually a noise :) fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2028212983

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  9s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  1s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 4 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 54s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 31s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 18s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 29s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 46s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 52s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 57s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 49s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 49s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m  7s |  The patch passed checkstyle in tez-api  |
   | +1 :green_heart: |  checkstyle  |   0m  6s |  The patch passed checkstyle in tez-runtime-internals  |
   | +1 :green_heart: |  checkstyle  |   0m  8s |  tez-mapreduce: The patch generated 0 new + 42 unchanged - 2 fixed = 42 total (was 44)  |
   | +1 :green_heart: |  checkstyle  |   0m 12s |  The patch passed checkstyle in tez-dag  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 20s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 55s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 29s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 59s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 14s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m 32s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/4/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 56f24c2d90b7 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/4/testReport/ |
   | Max. process+thread count | 373 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/4/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546918816


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -103,6 +113,7 @@ public Object getDeserializedUserPayload() {
   public String toString() {
     return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
         + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
-        + ", deserializedUserPayloadExists=" + (userPayloadObject != null) + "]";
-  } 
+        + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
+        + ", serializedPath=" + serializedPath + "]";

Review Comment:
   I can add a null check like:
   ```
    + serializedPath != null ? (", serializedPath=" + serializedPath) : "" + "]";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547308574


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -103,6 +113,7 @@ public Object getDeserializedUserPayload() {
   public String toString() {
     return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
         + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
-        + ", deserializedUserPayloadExists=" + (userPayloadObject != null) + "]";
-  } 
+        + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
+        + serializedPath != null ? (", serializedPath=" + serializedPath) : "" + "]";

Review Comment:
   agreed, it's time to refactor this



##########
tez-api/src/main/java/org/apache/tez/runtime/api/InputInitializerContext.java:
##########
@@ -87,7 +87,12 @@ public interface InputInitializerContext {
    * @return Resource
    */
   Resource getVertexTaskResource();
-  
+
+  /**
+   * Get the vertex id as integer that belongs to this input.
+   */
+  int getVertexId();

Review Comment:
   in hive's splitgenerator:
   https://github.com/apache/hive/pull/5174/files#diff-e4b863c252f3407ad05ccb56548e9f8384e0af33ce4bf6841a0ce11b173bf00dR197
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog merged PR #341:
URL: https://github.com/apache/tez/pull/341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546922261


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -79,6 +79,12 @@ public static InputDataInformationEvent createWithObjectPayload(int srcIndex,
     return new InputDataInformationEvent(srcIndex, userPayloadDeserialized, null);
   }
 
+  public static InputDataInformationEvent createWithSerializedPath(int srcIndex, String serializedPath) {
+    InputDataInformationEvent event = new InputDataInformationEvent(srcIndex, null);
+    event.serializedPath = serializedPath;

Review Comment:
   this method makes it impossible for users to call constructors in an invalid way (like: adding payload and serialized path at the same time, which should not happen), also this has a method name that implies the expected behavior
   what constructor are you suggesting that has the same advantages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547322934


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);

Review Comment:
   yeah, I think under normal circumstances everything should work: parse + delete
   any of them failing is fine to lead to a task attempt failure (and eventually rerun)
   regarding leftover files: it's okay, current hive patch uses the tez scratch dir, which is supposed to be cleaned up by hive as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "aturoczy (via GitHub)" <gi...@apache.org>.
aturoczy commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547289396


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -79,6 +79,12 @@ public static InputDataInformationEvent createWithObjectPayload(int srcIndex,
     return new InputDataInformationEvent(srcIndex, userPayloadDeserialized, null);
   }
 
+  public static InputDataInformationEvent createWithSerializedPath(int srcIndex, String serializedPath) {
+    InputDataInformationEvent event = new InputDataInformationEvent(srcIndex, null);
+    event.serializedPath = serializedPath;

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.
ayushtkn commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547274765


##########
tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
##########
@@ -56,9 +62,12 @@ public class TestMRInputHelpers {
   private static Path oldSplitsDir;
   private static Path newSplitsDir;
 
-  private static String TEST_ROOT_DIR = "target"
+  private static final String TEST_ROOT_DIR = "target"
       + Path.SEPARATOR + TestMRHelpers.class.getName() + "-tmpDir";
 
+  private static final Path LOCAL_TEST_ROOT_DIR = new Path("target"
+      + Path.SEPARATOR + TestMRHelpers.class.getName() + "-localtmpDir");
+

Review Comment:
   this is creating a path & then resolving it to LocalFs, maybe just doing a ``Files.createTempDirectory`` in the test might have saved you from this multiple..



##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);
+    }
+    return splitProto;
+  }
+
+  private static MRSplitProto readProtoFromPayload(InputDataInformationEvent initEvent) throws IOException {
+    ByteBuffer payload = initEvent.getUserPayload();
+    LOG.info("Reading InputDataInformationEvent from payload, size: {} bytes}", payload.limit());

Review Comment:
   ``initEvent.getUserPayload()`` can return `null` as well, there is a logic inside that, if that returns `null`, this log line will shoot a NPE., maybe just put `payload` in the log & if it is not null, the `ByteBuffer` `toString()` shall print the limit along with other details



##########
tez-api/src/main/java/org/apache/tez/runtime/api/InputInitializerContext.java:
##########
@@ -87,7 +87,12 @@ public interface InputInitializerContext {
    * @return Resource
    */
   Resource getVertexTaskResource();
-  
+
+  /**
+   * Get the vertex id as integer that belongs to this input.
+   */
+  int getVertexId();

Review Comment:
   Where is this being used?



##########
tez-runtime-internals/src/main/java/org/apache/tez/common/ProtoConverters.java:
##########
@@ -135,15 +138,22 @@ public static VertexManagerEvent convertVertexManagerEventFromProto(
     if (event.getUserPayload() != null) {
       builder.setUserPayload(ByteString.copyFrom(event.getUserPayload()));
     }
+    if (event.getSerializedPath() != null) {
+      builder.setSerializedPath(ByteString.copyFrom(event.getSerializedPath().getBytes(Charsets.UTF_8)));
+    }
     return builder.build();
   }
 
-  public static InputDataInformationEvent
-      convertRootInputDataInformationEventFromProto(
+  public static InputDataInformationEvent convertRootInputDataInformationEventFromProto(
       EventProtos.RootInputDataInformationEventProto proto) {
-    InputDataInformationEvent diEvent = InputDataInformationEvent.createWithSerializedPayload(
-        proto.getSourceIndex(),
-        proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null);
+    ByteBuffer payload = proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null;
+    InputDataInformationEvent diEvent = null;
+    if (!proto.getSerializedPath().isEmpty()) {

Review Comment:
   Should there be a ``proto.hasSerializedPath()`` check?



##########
tez-runtime-internals/src/main/java/org/apache/tez/common/ProtoConverters.java:
##########
@@ -135,15 +138,22 @@ public static VertexManagerEvent convertVertexManagerEventFromProto(
     if (event.getUserPayload() != null) {
       builder.setUserPayload(ByteString.copyFrom(event.getUserPayload()));
     }
+    if (event.getSerializedPath() != null) {
+      builder.setSerializedPath(ByteString.copyFrom(event.getSerializedPath().getBytes(Charsets.UTF_8)));
+    }
     return builder.build();
   }
 
-  public static InputDataInformationEvent
-      convertRootInputDataInformationEventFromProto(
+  public static InputDataInformationEvent convertRootInputDataInformationEventFromProto(
       EventProtos.RootInputDataInformationEventProto proto) {
-    InputDataInformationEvent diEvent = InputDataInformationEvent.createWithSerializedPayload(
-        proto.getSourceIndex(),
-        proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null);
+    ByteBuffer payload = proto.hasUserPayload() ? proto.getUserPayload().asReadOnlyByteBuffer() : null;
+    InputDataInformationEvent diEvent = null;
+    if (!proto.getSerializedPath().isEmpty()) {

Review Comment:
   At this point, there maybe a possibility that payload wasn't `null` as well, but we ignored that because there was a `serializedPath`?
   Should we throw out in that case?



##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);

Review Comment:
   Change to
   ```
        FileSystem fs = filePath.getFileSystem(jobConf);
   ```



##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);

Review Comment:
   If the parsing fails, in that case the file won't be deleted. is that expected? and if your delete fails our operation will fail as well, so it that ok or we can live with that?



##########
tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
##########
@@ -232,30 +278,32 @@ private DataSourceDescriptor generateDataSourceDescriptorMapRed(Path inputSplits
   @Test(timeout = 5000)
   public void testInputSplitLocalResourceCreationWithDifferentFS() throws Exception {
     FileSystem localFs = FileSystem.getLocal(conf);
-    Path LOCAL_TEST_ROOT_DIR = new Path("target"
-        + Path.SEPARATOR + TestMRHelpers.class.getName() + "-localtmpDir");
+    Path splitsDir = localFs.resolvePath(LOCAL_TEST_ROOT_DIR);
 
-    try {
-      localFs.mkdirs(LOCAL_TEST_ROOT_DIR);
-
-      Path splitsDir = localFs.resolvePath(LOCAL_TEST_ROOT_DIR);
+    DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);
 
-      DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);
-
-      Map<String, LocalResource> localResources = dataSource.getAdditionalLocalFiles();
+    Map<String, LocalResource> localResources = dataSource.getAdditionalLocalFiles();
 
-      Assert.assertEquals(2, localResources.size());
-      Assert.assertTrue(localResources.containsKey(
-          MRInputHelpers.JOB_SPLIT_RESOURCE_NAME));
-      Assert.assertTrue(localResources.containsKey(
-          MRInputHelpers.JOB_SPLIT_METAINFO_RESOURCE_NAME));
+    Assert.assertEquals(2, localResources.size());
+    Assert.assertTrue(localResources.containsKey(
+        MRInputHelpers.JOB_SPLIT_RESOURCE_NAME));
+    Assert.assertTrue(localResources.containsKey(
+        MRInputHelpers.JOB_SPLIT_METAINFO_RESOURCE_NAME));
 
-      for (LocalResource lr : localResources.values()) {
-        Assert.assertFalse(lr.getResource().getScheme().contains(remoteFs.getScheme()));
-      }
-    } finally {
-      localFs.delete(LOCAL_TEST_ROOT_DIR, true);
+    for (LocalResource lr : localResources.values()) {
+      Assert.assertFalse(lr.getResource().getScheme().contains(remoteFs.getScheme()));
     }
   }
 
+  @Before
+  public void before() throws IOException {
+    FileSystem localFs = FileSystem.getLocal(conf);

Review Comment:
   We are getting the `localFs` multiple times, can we store it as a class variable just like the `remoteFs`



##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -103,6 +113,7 @@ public Object getDeserializedUserPayload() {
   public String toString() {
     return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
         + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
-        + ", deserializedUserPayloadExists=" + (userPayloadObject != null) + "]";
-  } 
+        + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
+        + serializedPath != null ? (", serializedPath=" + serializedPath) : "" + "]";

Review Comment:
   This is wrong.
   ```
   "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
           + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
           + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
           + serializedPath != null ? <Something>
   ```
   
   This will evaluate the != operator and check if everything  before != is not null, which is always true, so it will always print just
   ``
   (", serializedPath=" + serializedPath)
   ``
   
   You need a bracket here:
   ```
       return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
           + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
           + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
           + (serializedPath != null ? (", serializedPath=" + serializedPath) : "" )+ "]";
   ```
   
   It is too much concat with conditions, maybe we can explore StringBuilder, that might look more redable, something like this
   ```
     public String toString() {
       StringBuilder sb = new StringBuilder();
       sb.append("InputDataInformationEvent [sourceIndex=").append(sourceIndex)
           .append(", targetIndex=").append(targetIndex)
           .append(", serializedUserPayloadExists=").append(userPayload != null)
           .append(", deserializedUserPayloadExists=").append(userPayloadObject != null);
       if (serializedPath != null) {
         sb.append(", serializedPath=").append(serializedPath);
       }
       sb.append("]");
       return sb.toString();
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547346163


##########
tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java:
##########
@@ -232,30 +278,32 @@ private DataSourceDescriptor generateDataSourceDescriptorMapRed(Path inputSplits
   @Test(timeout = 5000)
   public void testInputSplitLocalResourceCreationWithDifferentFS() throws Exception {
     FileSystem localFs = FileSystem.getLocal(conf);
-    Path LOCAL_TEST_ROOT_DIR = new Path("target"
-        + Path.SEPARATOR + TestMRHelpers.class.getName() + "-localtmpDir");
+    Path splitsDir = localFs.resolvePath(LOCAL_TEST_ROOT_DIR);
 
-    try {
-      localFs.mkdirs(LOCAL_TEST_ROOT_DIR);
-
-      Path splitsDir = localFs.resolvePath(LOCAL_TEST_ROOT_DIR);
+    DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);
 
-      DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);
-
-      Map<String, LocalResource> localResources = dataSource.getAdditionalLocalFiles();
+    Map<String, LocalResource> localResources = dataSource.getAdditionalLocalFiles();
 
-      Assert.assertEquals(2, localResources.size());
-      Assert.assertTrue(localResources.containsKey(
-          MRInputHelpers.JOB_SPLIT_RESOURCE_NAME));
-      Assert.assertTrue(localResources.containsKey(
-          MRInputHelpers.JOB_SPLIT_METAINFO_RESOURCE_NAME));
+    Assert.assertEquals(2, localResources.size());
+    Assert.assertTrue(localResources.containsKey(
+        MRInputHelpers.JOB_SPLIT_RESOURCE_NAME));
+    Assert.assertTrue(localResources.containsKey(
+        MRInputHelpers.JOB_SPLIT_METAINFO_RESOURCE_NAME));
 
-      for (LocalResource lr : localResources.values()) {
-        Assert.assertFalse(lr.getResource().getScheme().contains(remoteFs.getScheme()));
-      }
-    } finally {
-      localFs.delete(LOCAL_TEST_ROOT_DIR, true);
+    for (LocalResource lr : localResources.values()) {
+      Assert.assertFalse(lr.getResource().getScheme().contains(remoteFs.getScheme()));
     }
   }
 
+  @Before
+  public void before() throws IOException {
+    FileSystem localFs = FileSystem.getLocal(conf);

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2031463310

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  9s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 5 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 55s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   8m  2s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m 24s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 25s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 42s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 59s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m  8s |  tez-mapreduce: The patch generated 4 new + 43 unchanged - 2 fixed = 47 total (was 45)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 18s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 57s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 30s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 59s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 46s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  The patch does not generate ASF License warnings.  |
   |  |   |  40m 36s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux c4a861b17fef 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/artifact/out/diff-checkstyle-tez-mapreduce.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/testReport/ |
   | Max. process+thread count | 573 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/8/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "ayushtkn (via GitHub)" <gi...@apache.org>.
ayushtkn commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2031990575

   > I cannot address [HiddenField], I consider it as noise now
   
   @abstractdog that HiddenField should be fixable, it is telling you have localFs already defined above & you are redefining the same variable name in the test method, rather than using the class variable
   ```
   diff --git a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
   index c97158d14..a7501e8ae 100644
   --- a/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
   +++ b/tez-mapreduce/src/test/java/org/apache/tez/mapreduce/hadoop/TestMRInputHelpers.java
   @@ -217,7 +217,6 @@ public class TestMRInputHelpers {
      public void testInputEventSerializedPath() throws IOException {
        MRSplitProto proto = MRSplitProto.newBuilder().setSplitBytes(ByteString.copyFrom("splits".getBytes())).build();
    
   -    FileSystem localFs = FileSystem.getLocal(conf);
        Path splitsDir = localFs.resolvePath(localTestRootDir);
    
        Path serializedPath = new Path(splitsDir + Path.SEPARATOR + "splitpayload");
   @@ -282,7 +281,6 @@ public class TestMRInputHelpers {
    
      @Test(timeout = 5000)
      public void testInputSplitLocalResourceCreationWithDifferentFS() throws Exception {
   -    FileSystem localFs = FileSystem.getLocal(conf);
        Path splitsDir = localFs.resolvePath(localTestRootDir);
    
        DataSourceDescriptor dataSource = generateDataSourceDescriptorMapRed(splitsDir);
   
   ```
   
   Just remove that line ``FileSystem localFs = FileSystem.getLocal(conf);`` line and it should work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547310433


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);
+    }
+    return splitProto;
+  }
+
+  private static MRSplitProto readProtoFromPayload(InputDataInformationEvent initEvent) throws IOException {
+    ByteBuffer payload = initEvent.getUserPayload();
+    LOG.info("Reading InputDataInformationEvent from payload, size: {} bytes}", payload.limit());

Review Comment:
   ack, will do this below:
   ```
   LOG.info("Reading InputDataInformationEvent from payload: {}", payload);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2028036146

   @ayushtkn : can you please take a look at this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546922261


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -79,6 +79,12 @@ public static InputDataInformationEvent createWithObjectPayload(int srcIndex,
     return new InputDataInformationEvent(srcIndex, userPayloadDeserialized, null);
   }
 
+  public static InputDataInformationEvent createWithSerializedPath(int srcIndex, String serializedPath) {
+    InputDataInformationEvent event = new InputDataInformationEvent(srcIndex, null);
+    event.serializedPath = serializedPath;

Review Comment:
   this method makes it impossible for users to call constructors in an invalid way (like: adding payload and serialized path at the same time), also with having a method name that implies this behavior, what constructor are you suggesting that has the same advantages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "aturoczy (via GitHub)" <gi...@apache.org>.
aturoczy commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1546922508


##########
tez-api/src/main/java/org/apache/tez/runtime/api/events/InputDataInformationEvent.java:
##########
@@ -103,6 +113,7 @@ public Object getDeserializedUserPayload() {
   public String toString() {
     return "InputDataInformationEvent [sourceIndex=" + sourceIndex + ", targetIndex="
         + targetIndex + ", serializedUserPayloadExists=" + (userPayload != null)
-        + ", deserializedUserPayloadExists=" + (userPayloadObject != null) + "]";
-  } 
+        + ", deserializedUserPayloadExists=" + (userPayloadObject != null)
+        + ", serializedPath=" + serializedPath + "]";

Review Comment:
   I like it :) I know it add a bit complexity into a simple toString() but it is more elegant imho



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547310433


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);
+    }
+    return splitProto;
+  }
+
+  private static MRSplitProto readProtoFromPayload(InputDataInformationEvent initEvent) throws IOException {
+    ByteBuffer payload = initEvent.getUserPayload();
+    LOG.info("Reading InputDataInformationEvent from payload, size: {} bytes}", payload.limit());

Review Comment:
   ack, will do this below:
   ```
   LOG.info("Reading InputDataInformationEvent from payload: {}}", payload);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "aturoczy (via GitHub)" <gi...@apache.org>.
aturoczy commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547311523


##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);

Review Comment:
   In my understanding yes, this is expected. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "abstractdog (via GitHub)" <gi...@apache.org>.
abstractdog commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2031381165

   thanks for the comments @ayushtkn, addressed them in https://github.com/apache/tez/pull/341/commits/564949466ddbd95b00f4314e373ebcf4277ae9e2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Posted by "tez-yetus (via GitHub)" <gi...@apache.org>.
tez-yetus commented on PR #341:
URL: https://github.com/apache/tez/pull/341#issuecomment-2032069693

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m  9s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  No case conflicting files found.  |
   | +0 :ok: |  prototool  |   0m  0s |  prototool was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 5 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   5m 17s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   7m 22s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  compile  |   1m  9s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  master passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +0 :ok: |  spotbugs  |   0m 44s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 42s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m  7s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  4s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  cc  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 53s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  cc  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 52s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   0m  9s |  tez-mapreduce: The patch generated 2 new + 43 unchanged - 2 fixed = 45 total (was 45)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 46s |  the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06  |
   | +1 :green_heart: |  findbugs  |   2m 25s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 56s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 32s |  tez-runtime-internals in the patch passed.  |
   | +1 :green_heart: |  unit  |   1m  2s |  tez-mapreduce in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 15s |  tez-dag in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  The patch does not generate ASF License warnings.  |
   |  |   |  38m  5s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/10/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/341 |
   | JIRA Issue | TEZ-4548 |
   | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile cc prototool |
   | uname | Linux 5c39ec52f44b 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 34bb628e3 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06 |
   | checkstyle | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/10/artifact/out/diff-checkstyle-tez-mapreduce.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/10/testReport/ |
   | Max. process+thread count | 360 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-internals tez-mapreduce tez-dag U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-341/10/console |
   | versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org