You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/07/04 10:53:54 UTC

[GitHub] [incubator-uniffle] colinmjj opened a new pull request, #16: [Bug] Fix NPE problem when process the event if application was cleared already

colinmjj opened a new pull request, #16:
URL: https://github.com/apache/incubator-uniffle/pull/16

   ### What changes were proposed in this pull request?
   There will be NPE problem when process the event if application was cleared already
   
   
   ### Why are the changes needed?
   Fix a critical bug which cause resource leak. 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   UT is added
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16#discussion_r912895756


##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -135,59 +139,62 @@ public void addToFlushQueue(ShuffleDataFlushEvent event) {
 
   private void flushToFile(ShuffleDataFlushEvent event) {
 
-    Storage storage = storageManager.selectStorage(event);
-    if (!storage.canWrite()) {
-      addPendingEvents(event);
-      return;
-    }
-
     long start = System.currentTimeMillis();
     List<ShufflePartitionedBlock> blocks = event.getShuffleBlocks();
     boolean writeSuccess = false;
     try {
-      if (blocks == null || blocks.isEmpty()) {
-        LOG.info("There is no block to be flushed: " + event);
-      } else if (!event.isValid()) {
-        //  avoid printing error log
-        writeSuccess = true;
-        LOG.warn("AppId {} was removed already, event {} should be dropped", event.getAppId(), event);
-      } else {
-        ShuffleWriteHandler handler = storage.getOrCreateWriteHandler(new CreateShuffleWriteHandlerRequest(
-            storageType,
-            event.getAppId(),
-            event.getShuffleId(),
-            event.getStartPartition(),
-            event.getEndPartition(),
-            storageBasePaths,
-            shuffleServerId,
-            hadoopConf,
-            storageDataReplica));
+      Storage storage = storageManager.selectStorage(event);
+      // storage info maybe null if the application cache was cleared already
+      if (storage != null) {
+        if (!storage.canWrite()) {
+          addPendingEvents(event);

Review Comment:
   Why do we put the code into 
   ```
   try {
   } catch {
   } finally {
   }
   ```
   We release memory in the finally block, when we add the event to PendingEvents, we shouldn't release the event memory.



##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -285,7 +292,9 @@ void processPendingEvents() throws Exception {
           pendingEventTimeoutSec, event.getEvent());
       return;
     }
-    if (storage.canWrite()) {
+    // storage maybe null if the application cache was cleared already
+    // add event to flush queue, and it will be released
+    if (storage == null || storage.canWrite()) {

Review Comment:
   when storage equals `null`, why do we need add it to the FlushQueue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] colinmjj commented on a diff in pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
colinmjj commented on code in PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16#discussion_r912900515


##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -135,59 +139,62 @@ public void addToFlushQueue(ShuffleDataFlushEvent event) {
 
   private void flushToFile(ShuffleDataFlushEvent event) {
 
-    Storage storage = storageManager.selectStorage(event);
-    if (!storage.canWrite()) {
-      addPendingEvents(event);
-      return;
-    }
-
     long start = System.currentTimeMillis();
     List<ShufflePartitionedBlock> blocks = event.getShuffleBlocks();
     boolean writeSuccess = false;
     try {
-      if (blocks == null || blocks.isEmpty()) {
-        LOG.info("There is no block to be flushed: " + event);
-      } else if (!event.isValid()) {
-        //  avoid printing error log
-        writeSuccess = true;
-        LOG.warn("AppId {} was removed already, event {} should be dropped", event.getAppId(), event);
-      } else {
-        ShuffleWriteHandler handler = storage.getOrCreateWriteHandler(new CreateShuffleWriteHandlerRequest(
-            storageType,
-            event.getAppId(),
-            event.getShuffleId(),
-            event.getStartPartition(),
-            event.getEndPartition(),
-            storageBasePaths,
-            shuffleServerId,
-            hadoopConf,
-            storageDataReplica));
+      Storage storage = storageManager.selectStorage(event);
+      // storage info maybe null if the application cache was cleared already
+      if (storage != null) {
+        if (!storage.canWrite()) {
+          addPendingEvents(event);

Review Comment:
   right, it should be out of the try block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] colinmjj commented on a diff in pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
colinmjj commented on code in PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16#discussion_r912926067


##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -285,7 +292,9 @@ void processPendingEvents() throws Exception {
           pendingEventTimeoutSec, event.getEvent());
       return;
     }
-    if (storage.canWrite()) {
+    // storage maybe null if the application cache was cleared already
+    // add event to flush queue, and it will be released
+    if (storage == null || storage.canWrite()) {

Review Comment:
   ok, I'll update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16#discussion_r912917173


##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -285,7 +292,9 @@ void processPendingEvents() throws Exception {
           pendingEventTimeoutSec, event.getEvent());
       return;
     }
-    if (storage.canWrite()) {
+    // storage maybe null if the application cache was cleared already
+    // add event to flush queue, and it will be released
+    if (storage == null || storage.canWrite()) {

Review Comment:
   Or  we add the code
   ```
       if (storage == null) {
         if (shuffleServer != null) {
           shuffleServer.getShuffleBufferManager().releaseMemory(
               event.getEvent().getSize(), true, false);
         }
         LOG.error("storage is empty , the event {} is dropped",  event.getEvent());
         return;
       }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16#discussion_r912908901


##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -285,7 +292,9 @@ void processPendingEvents() throws Exception {
           pendingEventTimeoutSec, event.getEvent());
       return;
     }
-    if (storage.canWrite()) {
+    // storage maybe null if the application cache was cleared already
+    // add event to flush queue, and it will be released
+    if (storage == null || storage.canWrite()) {

Review Comment:
   Could we change the from
   ```
    if (System.currentTimeMillis() - event.getCreateTimeStamp() > pendingEventTimeoutSec * 1000L) {
   ```
   to 
   ```
    if (System.currentTimeMillis() - event.getCreateTimeStamp() > pendingEventTimeoutSec * 1000L || storage == null) {
   ```
   ?
   It seems better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] colinmjj commented on a diff in pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
colinmjj commented on code in PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16#discussion_r912899523


##########
server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java:
##########
@@ -285,7 +292,9 @@ void processPendingEvents() throws Exception {
           pendingEventTimeoutSec, event.getEvent());
       return;
     }
-    if (storage.canWrite()) {
+    // storage maybe null if the application cache was cleared already
+    // add event to flush queue, and it will be released
+    if (storage == null || storage.canWrite()) {

Review Comment:
   Event should be discarded and memory should be updated, these actions can't be finished here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #16: [Bug] Fix NPE problem when process the event if application was cleared already

Posted by GitBox <gi...@apache.org>.
jerqi merged PR #16:
URL: https://github.com/apache/incubator-uniffle/pull/16


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org