You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by "AngersZhuuuu (via GitHub)" <gi...@apache.org> on 2023/04/03 03:32:12 UTC

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

AngersZhuuuu opened a new pull request, #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404

   ### What changes were proposed in this pull request?
   ```
   23/04/03 11:14:24 INFO CodeGenerator: Code generated in 1017.663846 ms
   23/04/03 11:14:24 INFO SortBasedShuffleWriter: Pushdata in close, memory used 100663296
   23/04/03 11:14:24 INFO Executor: Executor is trying to kill task 24.0 in stage 2.0 (TID 248), reason: another attempt succeeded
   23/04/03 11:14:24 INFO DataPusher: Thread interrupted while joining pushThread
   23/04/03 11:14:25 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[DataPusher-248,5,main]
   java.lang.NullPointerException
       at com.aliyun.emr.rss.client.write.DataPushQueue.takePushTasks(DataPushQueue.java:100)
       at com.aliyun.emr.rss.client.write.DataPusher$1.run(DataPusher.java:125) 
   ```
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture commented on code in PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#discussion_r1155664066


##########
client/src/main/java/org/apache/celeborn/client/write/DataPushQueue.java:
##########
@@ -99,15 +99,19 @@ public ArrayList<PushTask> takePushTasks() throws IOException {
             client.getPartitionLocation(appId, shuffleId, numMappers, numPartitions);
         if (partitionLocationMap != null) {
           PartitionLocation loc = partitionLocationMap.get(partitionId);
-          Integer oldCapacity = workerCapacity.get(loc.hostAndPushPort());
-          if (oldCapacity == null) {
-            oldCapacity = maxInFlight - pushState.inflightPushes(loc.hostAndPushPort());
-            workerCapacity.put(loc.hostAndPushPort(), oldCapacity);
-          }
-          if (oldCapacity > 0) {
-            iterator.remove();
+          if (loc != null) {

Review Comment:
   IMO it shouldn't happen that partitionLocationMap is not null but loc is null. RegisterShuffle is guaranteed to be success or fail, no partial success. Maybe we should check why it happens in your environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on code in PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#discussion_r1155728204


##########
client/src/main/java/org/apache/celeborn/client/write/DataPushQueue.java:
##########
@@ -99,15 +99,19 @@ public ArrayList<PushTask> takePushTasks() throws IOException {
             client.getPartitionLocation(appId, shuffleId, numMappers, numPartitions);
         if (partitionLocationMap != null) {
           PartitionLocation loc = partitionLocationMap.get(partitionId);
-          Integer oldCapacity = workerCapacity.get(loc.hostAndPushPort());
-          if (oldCapacity == null) {
-            oldCapacity = maxInFlight - pushState.inflightPushes(loc.hostAndPushPort());
-            workerCapacity.put(loc.hostAndPushPort(), oldCapacity);
-          }
-          if (oldCapacity > 0) {
-            iterator.remove();
+          if (loc != null) {

Review Comment:
   I have checked, the logic seems the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] github-actions[bot] commented on pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#issuecomment-1519624098

   This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on code in PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#discussion_r1178921966


##########
client/src/main/java/org/apache/celeborn/client/write/DataPushQueue.java:
##########
@@ -99,15 +99,21 @@ public ArrayList<PushTask> takePushTasks() throws IOException {
             client.getPartitionLocation(appId, shuffleId, numMappers, numPartitions);
         if (partitionLocationMap != null) {
           PartitionLocation loc = partitionLocationMap.get(partitionId);
-          Integer oldCapacity = workerCapacity.get(loc.hostAndPushPort());
-          if (oldCapacity == null) {
-            oldCapacity = maxInFlight - pushState.inflightPushes(loc.hostAndPushPort());
-            workerCapacity.put(loc.hostAndPushPort(), oldCapacity);
-          }
-          if (oldCapacity > 0) {
-            iterator.remove();
+          // According to CELEBORN-560, call rerun task and speculative task after LifecycleManager
+          // handle StageEnd will return empty PartitionLocation map, here loc can be null

Review Comment:
   cc @waitinfuture 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on a diff in pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on code in PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#discussion_r1155722677


##########
client/src/main/java/org/apache/celeborn/client/write/DataPushQueue.java:
##########
@@ -99,15 +99,19 @@ public ArrayList<PushTask> takePushTasks() throws IOException {
             client.getPartitionLocation(appId, shuffleId, numMappers, numPartitions);
         if (partitionLocationMap != null) {
           PartitionLocation loc = partitionLocationMap.get(partitionId);
-          Integer oldCapacity = workerCapacity.get(loc.hostAndPushPort());
-          if (oldCapacity == null) {
-            oldCapacity = maxInFlight - pushState.inflightPushes(loc.hostAndPushPort());
-            workerCapacity.put(loc.hostAndPushPort(), oldCapacity);
-          }
-          if (oldCapacity > 0) {
-            iterator.remove();
+          if (loc != null) {

Review Comment:
   Let me check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#issuecomment-1493594209

   @waitinfuture I'm wondering why there is a null loc here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#issuecomment-1493599102

   ## [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1404?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1404](https://codecov.io/gh/apache/incubator-celeborn/pull/1404?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (adae37f) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/b4f8ab19bd2b7c46363dcd9293790bc6f40501b4?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b4f8ab1) will **decrease** coverage by `0.13%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main    #1404      +/-   ##
   ==========================================
   - Coverage   45.01%   44.87%   -0.13%     
   ==========================================
     Files         164      164              
     Lines       10393    10393              
     Branches     1057     1057              
   ==========================================
   - Hits         4677     4663      -14     
   - Misses       5376     5388      +12     
   - Partials      340      342       +2     
   ```
   
   
   [see 2 files with indirect coverage changes](https://codecov.io/gh/apache/incubator-celeborn/pull/1404/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on a diff in pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "waitinfuture (via GitHub)" <gi...@apache.org>.
waitinfuture commented on code in PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404#discussion_r1178948801


##########
client/src/main/java/org/apache/celeborn/client/write/DataPushQueue.java:
##########
@@ -99,15 +99,19 @@ public ArrayList<PushTask> takePushTasks() throws IOException {
             client.getPartitionLocation(appId, shuffleId, numMappers, numPartitions);
         if (partitionLocationMap != null) {
           PartitionLocation loc = partitionLocationMap.get(partitionId);
-          Integer oldCapacity = workerCapacity.get(loc.hostAndPushPort());
-          if (oldCapacity == null) {
-            oldCapacity = maxInFlight - pushState.inflightPushes(loc.hostAndPushPort());
-            workerCapacity.put(loc.hostAndPushPort(), oldCapacity);
-          }
-          if (oldCapacity > 0) {
-            iterator.remove();
+          if (loc != null) {

Review Comment:
   > 
   
   After discuss with @AngersZhuuuu , I recalled that in speculation scenarios, it can happen that the location is null. See description in https://github.com/apache/incubator-celeborn/pull/1468 . So I think this PR is good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu merged pull request #1404: [CELEBORN-479][FOLLOWUP] Add push task should check if loc is null

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu merged PR #1404:
URL: https://github.com/apache/incubator-celeborn/pull/1404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org