You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/09/11 14:19:59 UTC

[GitHub] [incubator-uniffle] smallzhongfeng opened a new pull request, #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

smallzhongfeng opened a new pull request, #210:
URL: https://github.com/apache/incubator-uniffle/pull/210

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://github.com/Tencent/Firestorm/blob/master/CONTRIBUTING.md
     2. Ensure you have added or run the appropriate tests for your PR
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]XXXX Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   The detection logic of abnormal paths is also added to the `APP_BALANCE` strategy.
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Anomaly detection can be performed when selecting a remote path to avoid selecting a problematic path.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   The parameters of the original IO_SAMPLE strategy are reused and the names are changed.
   
   1. `rss.coordinator.remote.storage.select.strategy` support `APP_BALANCE` and `IO_SAMPLE`, `APP_BALANCE` selection strategy based on the number of apps, `IO_SAMPLE` selection strategy based on time consumption of reading and writing files.
   2. `rss.coordinator.remote.storage.schedule.time` , if user choose `IO_SAMPLE`, file will be read and written at regular intervals.
   3. `rss.coordinator.remote.storage.schedule.file.size` , the size of each read / write HDFS file.
   4. `rss.coordinator.remote.storage.schedule.access.times`, number of times to read and write HDFS files.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Passed uts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r969100176


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,17 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);
 
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  void setFs(FileSystem fs);

Review Comment:
   Could you elaborate more, or give me some other examples ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1252214766

   For object storage, there may be more S3, but for cos, I don't know this very well. Do you have any good suggestions? @jerqi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r978344291


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   Should we add a method `pickStorage` or `selectStorage` for this interface?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1272316984

   This modification has done three things
   
   1. An abstract class `AbstractSelectStorageStrategy` is added to provide a method for reading and writing hdfs paths.
   2. Added sorted storage list.
   3. Removed a global variable of `isHealthy` and changed it to a local variable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1253121888

   > Could we support anomaly detection of hdfs files first?
   
   We should have more general interface. We can only implement part of them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980922498


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   You can look at the logic of `sortPathByRankValue`. When we judge by this parameter, it is a normal path, and then deal with it accordingly.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r978353636


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   Yes, that might make more sense for this interface, I will added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r990739277


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractSelectStorageStrategy.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.io.IOException;
+import java.util.Map;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.uniffle.common.exception.RssException;
+
+import static org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
+
+/**
+ * This is a simple implementation class, which provides some methods to check whether the path is normal
+ */
+public abstract class AbstractSelectStorageStrategy implements SelectStorageStrategy {
+  /**
+   * store remote path -> application count for assignment strategy
+   */
+  protected final Map<String, LowestIOSampleCostSelectStorageStrategy.RankValue> remoteStoragePathRankValue;
+  protected final int fileSize;
+
+  public AbstractSelectStorageStrategy(
+      Map<String, LowestIOSampleCostSelectStorageStrategy.RankValue> remoteStoragePathRankValue,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+  }
+
+  public void readAndWriteHdfsStorage(FileSystem fs, Path testPath,
+      String uri, RankValue rankValue) throws IOException {
+    byte[] data = RandomUtils.nextBytes(fileSize);
+    try (FSDataOutputStream fos = fs.create(testPath)) {
+      fos.write(data);
+      fos.flush();
+    }
+    byte[] readData = new byte[fileSize];
+    int readBytes;
+    try (FSDataInputStream fis = fs.open(testPath)) {
+      int hasReadBytes = 0;
+      do {
+        readBytes = fis.read(readData);
+        if (hasReadBytes < fileSize) {
+          for (int i = 0; i < readBytes; i++) {
+            if (data[hasReadBytes + i] != readData[i]) {
+              remoteStoragePathRankValue.put(uri, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+              throw new RssException("The content of reading and writing is inconsistent.");
+            }
+          }
+        }
+        hasReadBytes += readBytes;
+      } while (readBytes != -1);
+    }
+  }
+
+  @Override

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972596791


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   IIRC, Object storage such as s3 also has the concept of Path.



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
    IIRC, Object storage such as s3 also has the concept of Path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r990723514


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;
+
+  public AppBalanceSelectStorageStrategy(
+      Map<String, RankValue> remoteStoragePathRankValue,
+      Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo,
+      Map<String, RemoteStorageInfo> availableRemoteStorageInfo,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    this.appIdToRemoteStorageInfo = appIdToRemoteStorageInfo;
+    this.availableRemoteStorageInfo = availableRemoteStorageInfo;
+    this.hdfsConf = new Configuration();
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+    readAndWriteTimes = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_ACCESS_TIMES);
   }
 
-  /**
-   * the strategy of pick remote storage is according to assignment count
-   */
-  @Override
-  public RemoteStorageInfo pickRemoteStorage(String appId) {
-    if (appIdToRemoteStorageInfo.containsKey(appId)) {
-      return appIdToRemoteStorageInfo.get(appId);
-    }
-
-    // create list for sort
-    List<Map.Entry<String, RankValue>> sizeList =
-        Lists.newArrayList(remoteStoragePathCounter.entrySet()).stream().filter(Objects::nonNull)
-            .sorted(Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
-
-    for (Map.Entry<String, RankValue> entry : sizeList) {
-      String storagePath = entry.getKey();
-      if (availableRemoteStorageInfo.containsKey(storagePath)) {
-        appIdToRemoteStorageInfo.putIfAbsent(appId, availableRemoteStorageInfo.get(storagePath));
-        incRemoteStorageCounter(storagePath);
-        break;
-      }
-    }
-    return appIdToRemoteStorageInfo.get(appId);
-  }
-
-  @Override
   @VisibleForTesting
-  public synchronized void incRemoteStorageCounter(String remoteStoragePath) {
-    RankValue counter = remoteStoragePathCounter.get(remoteStoragePath);
-    if (counter != null) {
-      counter.getAppNum().incrementAndGet();
-    } else {
-      // it may be happened when assignment remote storage
-      // and refresh remote storage at the same time
-      LOG.warn("Remote storage path lost during assignment: {} doesn't exist, reset it to 1",
-          remoteStoragePath);
-      remoteStoragePathCounter.put(remoteStoragePath, new RankValue(1));
+  public List<Map.Entry<String, RankValue>> sortPathByRankValue(
+      String path, String test, boolean isHealthy) {
+    try {
+      FileSystem fs = HadoopFilesystemProvider.getFilesystem(new Path(path), hdfsConf);
+      fs.delete(new Path(test),true);
+      if (isHealthy) {
+        RankValue rankValue = remoteStoragePathRankValue.get(path);
+        remoteStoragePathRankValue.put(path, new RankValue(0, rankValue.getAppNum().get()));
+      }
+    } catch (Exception e) {
+      RankValue rankValue = remoteStoragePathRankValue.get(path);
+      remoteStoragePathRankValue.put(path, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      LOG.error("Failed to sort, we will not use this remote path {}.", path, e);
     }
+    return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet()).stream()
+        .filter(Objects::nonNull).collect(Collectors.toList());
   }
 
   @Override
-  @VisibleForTesting
-  public synchronized void decRemoteStorageCounter(String storagePath) {
-    if (!StringUtils.isEmpty(storagePath)) {
-      RankValue atomic = remoteStoragePathCounter.get(storagePath);
-      if (atomic != null) {
-        double count = atomic.getAppNum().decrementAndGet();
-        if (count < 0) {
-          LOG.warn("Unexpected counter for remote storage: {}, which is {}, reset to 0",
-              storagePath, count);
-          atomic.getAppNum().set(0);
+  public List<Map.Entry<String, RankValue>> detectStorage(String uri) {
+    if (uri.startsWith(ApplicationManager.REMOTE_PATH_SCHEMA.get(0))) {
+      setRemotePathIsHealthy(true);
+      Path remotePath = new Path(uri);
+      String rssTest = uri + "/rssTest";
+      Path testPath = new Path(rssTest);
+      try {
+        FileSystem fs = HadoopFilesystemProvider.getFilesystem(remotePath, hdfsConf);
+        for (int j = 0; j < readAndWriteTimes; j++) {

Review Comment:
   done.



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r978353636


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   Yes, that might make more sense for this interface, I will add.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1254079717

   > We should have more general interface. We can only implement part of them.
   
   I think we only need to provide a path that includes the remote, and the rest of the methods can be implemented by judging different storage systems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r977185736


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   The meaning of the `readAndWrite` interface is to perform anomaly detection of remote paths. For hdfs, reading and writing files is a more direct way to detect, so I took this name, but the real comparison method comes from `sortPathByRankValue`, which I do not have. The reason for defining it as an interface is because the objects to be sorted may be different, and this is left to different storage methods for implementation.



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   The meaning of the `readAndWrite` interface is to perform anomaly detection of remote paths. For hdfs, reading and writing files is a more direct way to detect, so I took this name, but the real comparison method comes from `sortPathByRankValue`, which I do not have. The reason for defining it as an interface is because the objects to be sorted may be different, and this is left to different storage methods for implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1254451810

   > So I did not use the concept of `path` for the time being, but used a string to represent.
   The concept of subsequent use of buckets or paths can be implemented through the readAndWrite interface.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980836844


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;
+
+  public AppBalanceSelectStorageStrategy(
+      Map<String, RankValue> remoteStoragePathRankValue,
+      Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo,
+      Map<String, RemoteStorageInfo> availableRemoteStorageInfo,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    this.appIdToRemoteStorageInfo = appIdToRemoteStorageInfo;
+    this.availableRemoteStorageInfo = availableRemoteStorageInfo;
+    this.hdfsConf = new Configuration();
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+    readAndWriteTimes = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_ACCESS_TIMES);
   }
 
-  /**
-   * the strategy of pick remote storage is according to assignment count
-   */
-  @Override
-  public RemoteStorageInfo pickRemoteStorage(String appId) {
-    if (appIdToRemoteStorageInfo.containsKey(appId)) {
-      return appIdToRemoteStorageInfo.get(appId);
-    }
-
-    // create list for sort
-    List<Map.Entry<String, RankValue>> sizeList =
-        Lists.newArrayList(remoteStoragePathCounter.entrySet()).stream().filter(Objects::nonNull)
-            .sorted(Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
-
-    for (Map.Entry<String, RankValue> entry : sizeList) {
-      String storagePath = entry.getKey();
-      if (availableRemoteStorageInfo.containsKey(storagePath)) {
-        appIdToRemoteStorageInfo.putIfAbsent(appId, availableRemoteStorageInfo.get(storagePath));
-        incRemoteStorageCounter(storagePath);
-        break;
-      }
-    }
-    return appIdToRemoteStorageInfo.get(appId);
-  }
-
-  @Override
   @VisibleForTesting
-  public synchronized void incRemoteStorageCounter(String remoteStoragePath) {
-    RankValue counter = remoteStoragePathCounter.get(remoteStoragePath);
-    if (counter != null) {
-      counter.getAppNum().incrementAndGet();
-    } else {
-      // it may be happened when assignment remote storage
-      // and refresh remote storage at the same time
-      LOG.warn("Remote storage path lost during assignment: {} doesn't exist, reset it to 1",
-          remoteStoragePath);
-      remoteStoragePathCounter.put(remoteStoragePath, new RankValue(1));
+  public List<Map.Entry<String, RankValue>> sortPathByRankValue(
+      String path, String test, boolean isHealthy) {
+    try {
+      FileSystem fs = HadoopFilesystemProvider.getFilesystem(new Path(path), hdfsConf);
+      fs.delete(new Path(test),true);
+      if (isHealthy) {
+        RankValue rankValue = remoteStoragePathRankValue.get(path);
+        remoteStoragePathRankValue.put(path, new RankValue(0, rankValue.getAppNum().get()));
+      }
+    } catch (Exception e) {
+      RankValue rankValue = remoteStoragePathRankValue.get(path);
+      remoteStoragePathRankValue.put(path, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      LOG.error("Failed to sort, we will not use this remote path {}.", path, e);
     }
+    return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet()).stream()
+        .filter(Objects::nonNull).collect(Collectors.toList());
   }
 
   @Override
-  @VisibleForTesting
-  public synchronized void decRemoteStorageCounter(String storagePath) {
-    if (!StringUtils.isEmpty(storagePath)) {
-      RankValue atomic = remoteStoragePathCounter.get(storagePath);
-      if (atomic != null) {
-        double count = atomic.getAppNum().decrementAndGet();
-        if (count < 0) {
-          LOG.warn("Unexpected counter for remote storage: {}, which is {}, reset to 0",
-              storagePath, count);
-          atomic.getAppNum().set(0);
+  public List<Map.Entry<String, RankValue>> detectStorage(String uri) {
+    if (uri.startsWith(ApplicationManager.REMOTE_PATH_SCHEMA.get(0))) {
+      setRemotePathIsHealthy(true);
+      Path remotePath = new Path(uri);
+      String rssTest = uri + "/rssTest";
+      Path testPath = new Path(rssTest);
+      try {
+        FileSystem fs = HadoopFilesystemProvider.getFilesystem(remotePath, hdfsConf);
+        for (int j = 0; j < readAndWriteTimes; j++) {

Review Comment:
   For appBalance, it can indeed be changed once, but for io sampling, the number of times can be increased, perhaps we can set the parameter to 1 by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979212291


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> pickStorage(List<Map.Entry<String, RankValue>> pathList);

Review Comment:
   What about `uriRankings`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1242978806

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/210?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#210](https://codecov.io/gh/apache/incubator-uniffle/pull/210?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f87f484) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/9ef48d6bc7891a3866fbd95ce5f2d8ffd0ae2f25?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9ef48d6) will **decrease** coverage by `1.28%`.
   > The diff coverage is `71.42%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #210      +/-   ##
   ============================================
   - Coverage     59.10%   57.81%   -1.29%     
   + Complexity     1326     1238      -88     
   ============================================
     Files           160      151       -9     
     Lines          8727     8214     -513     
     Branches        817      784      -33     
   ============================================
   - Hits           5158     4749     -409     
   + Misses         3303     3214      -89     
   + Partials        266      251      -15     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/210?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...e/coordinator/AppBalanceSelectStorageStrategy.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvQXBwQmFsYW5jZVNlbGVjdFN0b3JhZ2VTdHJhdGVneS5qYXZh) | `58.82% <61.53%> (-13.52%)` | :arrow_down: |
   | [...apache/uniffle/coordinator/ApplicationManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvQXBwbGljYXRpb25NYW5hZ2VyLmphdmE=) | `78.17% <66.31%> (-12.10%)` | :arrow_down: |
   | [...nator/LowestIOSampleCostSelectStorageStrategy.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvTG93ZXN0SU9TYW1wbGVDb3N0U2VsZWN0U3RvcmFnZVN0cmF0ZWd5LmphdmE=) | `97.43% <94.73%> (+24.28%)` | :arrow_up: |
   | [...rg/apache/uniffle/coordinator/CoordinatorConf.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29vcmRpbmF0b3Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3VuaWZmbGUvY29vcmRpbmF0b3IvQ29vcmRpbmF0b3JDb25mLmphdmE=) | `96.94% <100.00%> (ø)` | |
   | [...storage/handler/impl/DataSkippableReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9EYXRhU2tpcHBhYmxlUmVhZEhhbmRsZXIuamF2YQ==) | `81.25% <0.00%> (-3.13%)` | :arrow_down: |
   | [.../org/apache/spark/shuffle/writer/WriterBuffer.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JpdGVyQnVmZmVyLmphdmE=) | | |
   | [...org/apache/spark/shuffle/RssSparkShuffleUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya1NodWZmbGVVdGlscy5qYXZh) | | |
   | [...ava/org/apache/spark/shuffle/RssShuffleHandle.java](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTaHVmZmxlSGFuZGxlLmphdmE=) | | |
   | ... and [6 more](https://codecov.io/gh/apache/incubator-uniffle/pull/210/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi merged PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1272527096

   > https://github.com/apache/incubator-uniffle/actions/runs/3213433147 There is a flay test. Could you help me fix it? [test-reports-spark3.2.0 (1).zip](https://github.com/apache/incubator-uniffle/files/9741113/test-reports-spark3.2.0.1.zip)
   
   OK, I will fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1252643394

   Could we support anomaly detection of hdfs files first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1246585766

   Like discussion at dev mail list, we will freeze the code and cut 0.6 version branch in September 15, we  will not merge this pr before I cut 0.6 version branch, are you ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
zuston commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1253279696

   > > For object storage, there may be more S3, but for cos, I don't know this very well. Do you have any good suggestions? @jerqi
   > 
   > Do the S3 have the concept of `path`? It only have the concept of bucket.
   
   Yes, the s3 compatible filesystem of Hadoop has implemented the abstraction of `Path`, but it's slower than the s3 original api when using `list` files/dirs. I think cos is the same with s3.
   
   By the way, I implemented the hadoop filesystem api for internal object store in the past, so have some experience on it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980668872


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/ApplicationManager.java:
##########
@@ -42,39 +43,59 @@
 public class ApplicationManager {
 
   private static final Logger LOG = LoggerFactory.getLogger(ApplicationManager.class);
-  private long expired;
-  private StrategyName storageStrategy;
-  private Map<String, Long> appIds = Maps.newConcurrentMap();
-  private SelectStorageStrategy selectStorageStrategy;
+  // TODO: Add anomaly detection for other storage
+  public static final List<String> REMOTE_PATH_SCHEMA = Arrays.asList("hdfs");
+  private final long expired;
+  private final StrategyName storageStrategy;
+  private final Map<String, Long> appIds = Maps.newConcurrentMap();
+  private final SelectStorageStrategy selectStorageStrategy;
   // store appId -> remote path to make sure all shuffle data of the same application
   // will be written to the same remote storage
-  private Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   // store remote path -> application count for assignment strategy
-  private Map<String, RankValue> remoteStoragePathRankValue;
-  private Map<String, String> remoteStorageToHost = Maps.newConcurrentMap();
-  private Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-  private ScheduledExecutorService scheduledExecutorService;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, String> remoteStorageToHost = Maps.newConcurrentMap();
+  private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
+  private List<Map.Entry<String, RankValue>> sizeList;
   // it's only for test case to check if status check has problem
   private boolean hasErrorInStatusCheck = false;
 
   public ApplicationManager(CoordinatorConf conf) {
     storageStrategy = conf.get(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SELECT_STRATEGY);
+    appIdToRemoteStorageInfo = Maps.newConcurrentMap();
+    remoteStoragePathRankValue = Maps.newConcurrentMap();
+    availableRemoteStorageInfo = Maps.newConcurrentMap();
     if (StrategyName.IO_SAMPLE == storageStrategy) {
-      selectStorageStrategy = new LowestIOSampleCostSelectStorageStrategy(conf);
+      selectStorageStrategy = new LowestIOSampleCostSelectStorageStrategy(remoteStoragePathRankValue,
+          appIdToRemoteStorageInfo, availableRemoteStorageInfo, conf);
     } else if (StrategyName.APP_BALANCE == storageStrategy) {
-      selectStorageStrategy = new AppBalanceSelectStorageStrategy();
+      selectStorageStrategy = new AppBalanceSelectStorageStrategy(remoteStoragePathRankValue,
+          appIdToRemoteStorageInfo, availableRemoteStorageInfo, conf);
     } else {
       throw new UnsupportedOperationException("Unsupported selected storage strategy.");
     }
-    appIdToRemoteStorageInfo = selectStorageStrategy.getAppIdToRemoteStorageInfo();
-    remoteStoragePathRankValue = selectStorageStrategy.getRemoteStoragePathRankValue();
-    availableRemoteStorageInfo = selectStorageStrategy.getAvailableRemoteStorageInfo();
     expired = conf.getLong(CoordinatorConf.COORDINATOR_APP_EXPIRED);
     // the thread for checking application status
-    scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
+    ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(
         ThreadUtils.getThreadFactory("ApplicationManager-%d"));
     scheduledExecutorService.scheduleAtFixedRate(
-        () -> statusCheck(), expired / 2, expired / 2, TimeUnit.MILLISECONDS);
+        this::statusCheck, expired / 2, expired / 2, TimeUnit.MILLISECONDS);
+    // the thread for checking if the storage is normal
+    ScheduledExecutorService detectStorageScheduler = Executors.newSingleThreadScheduledExecutor(
+        ThreadUtils.getThreadFactory("readWriteRankScheduler-%d"));
+    // should init later than the refreshRemoteStorage init
+    detectStorageScheduler.scheduleAtFixedRate(this::checkReadAndWrite, 1000,
+        conf.getLong(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_TIME), TimeUnit.MILLISECONDS);
+  }
+
+  public void checkReadAndWrite() {

Review Comment:
   Should we put this logic into the class `SelectStorageStrategy`?



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   What does this variable `remotePathIsHealthy`  mean?
   This class is a strategy. I can't get the meaning of this varibale.



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;
+
+  public AppBalanceSelectStorageStrategy(
+      Map<String, RankValue> remoteStoragePathRankValue,
+      Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo,
+      Map<String, RemoteStorageInfo> availableRemoteStorageInfo,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    this.appIdToRemoteStorageInfo = appIdToRemoteStorageInfo;
+    this.availableRemoteStorageInfo = availableRemoteStorageInfo;
+    this.hdfsConf = new Configuration();
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+    readAndWriteTimes = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_ACCESS_TIMES);
   }
 
-  /**
-   * the strategy of pick remote storage is according to assignment count
-   */
-  @Override
-  public RemoteStorageInfo pickRemoteStorage(String appId) {
-    if (appIdToRemoteStorageInfo.containsKey(appId)) {
-      return appIdToRemoteStorageInfo.get(appId);
-    }
-
-    // create list for sort
-    List<Map.Entry<String, RankValue>> sizeList =
-        Lists.newArrayList(remoteStoragePathCounter.entrySet()).stream().filter(Objects::nonNull)
-            .sorted(Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
-
-    for (Map.Entry<String, RankValue> entry : sizeList) {
-      String storagePath = entry.getKey();
-      if (availableRemoteStorageInfo.containsKey(storagePath)) {
-        appIdToRemoteStorageInfo.putIfAbsent(appId, availableRemoteStorageInfo.get(storagePath));
-        incRemoteStorageCounter(storagePath);
-        break;
-      }
-    }
-    return appIdToRemoteStorageInfo.get(appId);
-  }
-
-  @Override
   @VisibleForTesting
-  public synchronized void incRemoteStorageCounter(String remoteStoragePath) {
-    RankValue counter = remoteStoragePathCounter.get(remoteStoragePath);
-    if (counter != null) {
-      counter.getAppNum().incrementAndGet();
-    } else {
-      // it may be happened when assignment remote storage
-      // and refresh remote storage at the same time
-      LOG.warn("Remote storage path lost during assignment: {} doesn't exist, reset it to 1",
-          remoteStoragePath);
-      remoteStoragePathCounter.put(remoteStoragePath, new RankValue(1));
+  public List<Map.Entry<String, RankValue>> sortPathByRankValue(
+      String path, String test, boolean isHealthy) {
+    try {
+      FileSystem fs = HadoopFilesystemProvider.getFilesystem(new Path(path), hdfsConf);
+      fs.delete(new Path(test),true);
+      if (isHealthy) {
+        RankValue rankValue = remoteStoragePathRankValue.get(path);
+        remoteStoragePathRankValue.put(path, new RankValue(0, rankValue.getAppNum().get()));
+      }
+    } catch (Exception e) {
+      RankValue rankValue = remoteStoragePathRankValue.get(path);
+      remoteStoragePathRankValue.put(path, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      LOG.error("Failed to sort, we will not use this remote path {}.", path, e);
     }
+    return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet()).stream()
+        .filter(Objects::nonNull).collect(Collectors.toList());
   }
 
   @Override
-  @VisibleForTesting
-  public synchronized void decRemoteStorageCounter(String storagePath) {
-    if (!StringUtils.isEmpty(storagePath)) {
-      RankValue atomic = remoteStoragePathCounter.get(storagePath);
-      if (atomic != null) {
-        double count = atomic.getAppNum().decrementAndGet();
-        if (count < 0) {
-          LOG.warn("Unexpected counter for remote storage: {}, which is {}, reset to 0",
-              storagePath, count);
-          atomic.getAppNum().set(0);
+  public List<Map.Entry<String, RankValue>> detectStorage(String uri) {
+    if (uri.startsWith(ApplicationManager.REMOTE_PATH_SCHEMA.get(0))) {
+      setRemotePathIsHealthy(true);
+      Path remotePath = new Path(uri);
+      String rssTest = uri + "/rssTest";
+      Path testPath = new Path(rssTest);
+      try {
+        FileSystem fs = HadoopFilesystemProvider.getFilesystem(remotePath, hdfsConf);
+        for (int j = 0; j < readAndWriteTimes; j++) {
+          byte[] data = RandomUtils.nextBytes(fileSize);
+          try (FSDataOutputStream fos = fs.create(testPath)) {
+            fos.write(data);
+            fos.flush();
+          }
+          byte[] readData = new byte[fileSize];
+          int readBytes;
+          try (FSDataInputStream fis = fs.open(testPath)) {
+            int hasReadBytes = 0;
+            do {
+              readBytes = fis.read(readData);
+              if (hasReadBytes < fileSize) {
+                for (int i = 0; i < readBytes; i++) {
+                  if (data[hasReadBytes + i] != readData[i]) {
+                    RankValue rankValue = remoteStoragePathRankValue.get(uri);
+                    remoteStoragePathRankValue.put(uri,
+                        new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+                    throw new RssException("The content of reading and writing is inconsistent.");
+                  }
+                }
+              }
+              hasReadBytes += readBytes;
+            } while (readBytes != -1);
+          }
         }
-      } else {
-        LOG.warn("Can't find counter for remote storage: {}", storagePath);
-        remoteStoragePathCounter.putIfAbsent(storagePath, new RankValue(0));
-      }
-      if (remoteStoragePathCounter.get(storagePath).getAppNum().get() == 0
-          && !availableRemoteStorageInfo.containsKey(storagePath)) {
-        remoteStoragePathCounter.remove(storagePath);
+      } catch (Exception e) {
+        setRemotePathIsHealthy(false);
+        LOG.error("Storage read and write error, we will not use this remote path {}.", uri, e);
+        RankValue rankValue = remoteStoragePathRankValue.get(uri);
+        remoteStoragePathRankValue.put(uri, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      } finally {
+        return sortPathByRankValue(uri, rssTest, remotePathIsHealthy);
       }
+    } else {
+      return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet());
     }
   }
 
+  /**
+   * When choosing the AppBalance strategy, each time you select a path,
+   * you should know the number of the latest apps in different paths
+   */
   @Override
-  public synchronized void removePathFromCounter(String storagePath) {
-    RankValue atomic = remoteStoragePathCounter.get(storagePath);
-    if (atomic != null && atomic.getAppNum().get() == 0) {
-      remoteStoragePathCounter.remove(storagePath);
+  public synchronized RemoteStorageInfo pickStorage(
+      List<Map.Entry<String, RankValue>> uris, String appId) {
+    boolean isUnhealthy =
+        uris.stream().noneMatch(rv -> rv.getValue().getReadAndWriteTime().get() != Long.MAX_VALUE);
+    if (!isUnhealthy) {
+      // If there is only one unhealthy path, then filter that path
+      uris = uris.stream().filter(rv -> rv.getValue().getReadAndWriteTime().get() != Long.MAX_VALUE).sorted(
+          Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
+    } else {
+      // If all paths are unhealthy, assign paths according to the number of apps

Review Comment:
   Should we add some logs and metrics when all paths are unhealthy?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1247015819

   > Like discussion at dev mail list, we will freeze the code and cut 0.6 version branch in September 15, we will not merge this pr before I cut 0.6 version branch, are you ok?
   
   No problem, I got it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979211972


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> pickStorage(List<Map.Entry<String, RankValue>> pathList);

Review Comment:
   Do you have any good advice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1259186968

   Maybe we should have a abstract class to define the common behaviors of SelectStorageStrategy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972594396


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   Maybe we can support fileSystem at the beginning. If we need to support db, then there may be more things to consider, or we can add a parameter. If it is a file supported by fileSystem, we will turn on this switch. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972596791


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
    IIRC, object storage such as s3 also has the concept of Path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972597609


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   But `cos` doesn't seem to have this concept.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1253484528

   > > > For object storage, there may be more S3, but for cos, I don't know this very well. Do you have any good suggestions? @jerqi
   > > 
   > > 
   > > Do the S3 have the concept of `path`? It only have the concept of bucket.
   > 
   > Yes, the s3 compatible filesystem of Hadoop has implemented the abstraction of `Path`, but it's slower than the s3 original api when using `list` files/dirs. I think cos is the same with s3.
   > 
   > By the way, I implemented the hadoop filesystem api for internal object store in the past, so have some experience on it.
   
   We shouldn't relay on the filesystem of object storage too much. Some operations of object storage is slow. For example, `list` operation, `rename` operation and `append` operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979271672


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> pickStorage(List<Map.Entry<String, RankValue>> pathList);

Review Comment:
   I modified it to let `pickStorage` return to a selected path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r969078846


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,17 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);
 
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  void setFs(FileSystem fs);

Review Comment:
   We would better not assume our remote storage is only FileSystem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980058162


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String uri);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  String pickStorage(List<Map.Entry<String, RankValue>> uris, String appId);

Review Comment:
   OK, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980837192


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;
+
+  public AppBalanceSelectStorageStrategy(
+      Map<String, RankValue> remoteStoragePathRankValue,
+      Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo,
+      Map<String, RemoteStorageInfo> availableRemoteStorageInfo,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    this.appIdToRemoteStorageInfo = appIdToRemoteStorageInfo;
+    this.availableRemoteStorageInfo = availableRemoteStorageInfo;
+    this.hdfsConf = new Configuration();
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+    readAndWriteTimes = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_ACCESS_TIMES);
   }
 
-  /**
-   * the strategy of pick remote storage is according to assignment count
-   */
-  @Override
-  public RemoteStorageInfo pickRemoteStorage(String appId) {
-    if (appIdToRemoteStorageInfo.containsKey(appId)) {
-      return appIdToRemoteStorageInfo.get(appId);
-    }
-
-    // create list for sort
-    List<Map.Entry<String, RankValue>> sizeList =
-        Lists.newArrayList(remoteStoragePathCounter.entrySet()).stream().filter(Objects::nonNull)
-            .sorted(Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
-
-    for (Map.Entry<String, RankValue> entry : sizeList) {
-      String storagePath = entry.getKey();
-      if (availableRemoteStorageInfo.containsKey(storagePath)) {
-        appIdToRemoteStorageInfo.putIfAbsent(appId, availableRemoteStorageInfo.get(storagePath));
-        incRemoteStorageCounter(storagePath);
-        break;
-      }
-    }
-    return appIdToRemoteStorageInfo.get(appId);
-  }
-
-  @Override
   @VisibleForTesting
-  public synchronized void incRemoteStorageCounter(String remoteStoragePath) {
-    RankValue counter = remoteStoragePathCounter.get(remoteStoragePath);
-    if (counter != null) {
-      counter.getAppNum().incrementAndGet();
-    } else {
-      // it may be happened when assignment remote storage
-      // and refresh remote storage at the same time
-      LOG.warn("Remote storage path lost during assignment: {} doesn't exist, reset it to 1",
-          remoteStoragePath);
-      remoteStoragePathCounter.put(remoteStoragePath, new RankValue(1));
+  public List<Map.Entry<String, RankValue>> sortPathByRankValue(
+      String path, String test, boolean isHealthy) {
+    try {
+      FileSystem fs = HadoopFilesystemProvider.getFilesystem(new Path(path), hdfsConf);
+      fs.delete(new Path(test),true);
+      if (isHealthy) {
+        RankValue rankValue = remoteStoragePathRankValue.get(path);
+        remoteStoragePathRankValue.put(path, new RankValue(0, rankValue.getAppNum().get()));
+      }
+    } catch (Exception e) {
+      RankValue rankValue = remoteStoragePathRankValue.get(path);
+      remoteStoragePathRankValue.put(path, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      LOG.error("Failed to sort, we will not use this remote path {}.", path, e);
     }
+    return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet()).stream()
+        .filter(Objects::nonNull).collect(Collectors.toList());
   }
 
   @Override
-  @VisibleForTesting
-  public synchronized void decRemoteStorageCounter(String storagePath) {
-    if (!StringUtils.isEmpty(storagePath)) {
-      RankValue atomic = remoteStoragePathCounter.get(storagePath);
-      if (atomic != null) {
-        double count = atomic.getAppNum().decrementAndGet();
-        if (count < 0) {
-          LOG.warn("Unexpected counter for remote storage: {}, which is {}, reset to 0",
-              storagePath, count);
-          atomic.getAppNum().set(0);
+  public List<Map.Entry<String, RankValue>> detectStorage(String uri) {
+    if (uri.startsWith(ApplicationManager.REMOTE_PATH_SCHEMA.get(0))) {
+      setRemotePathIsHealthy(true);
+      Path remotePath = new Path(uri);
+      String rssTest = uri + "/rssTest";
+      Path testPath = new Path(rssTest);
+      try {
+        FileSystem fs = HadoopFilesystemProvider.getFilesystem(remotePath, hdfsConf);
+        for (int j = 0; j < readAndWriteTimes; j++) {
+          byte[] data = RandomUtils.nextBytes(fileSize);
+          try (FSDataOutputStream fos = fs.create(testPath)) {
+            fos.write(data);
+            fos.flush();
+          }
+          byte[] readData = new byte[fileSize];
+          int readBytes;
+          try (FSDataInputStream fis = fs.open(testPath)) {
+            int hasReadBytes = 0;
+            do {
+              readBytes = fis.read(readData);
+              if (hasReadBytes < fileSize) {
+                for (int i = 0; i < readBytes; i++) {
+                  if (data[hasReadBytes + i] != readData[i]) {
+                    RankValue rankValue = remoteStoragePathRankValue.get(uri);
+                    remoteStoragePathRankValue.put(uri,
+                        new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+                    throw new RssException("The content of reading and writing is inconsistent.");
+                  }
+                }
+              }
+              hasReadBytes += readBytes;
+            } while (readBytes != -1);
+          }
         }
-      } else {
-        LOG.warn("Can't find counter for remote storage: {}", storagePath);
-        remoteStoragePathCounter.putIfAbsent(storagePath, new RankValue(0));
-      }
-      if (remoteStoragePathCounter.get(storagePath).getAppNum().get() == 0
-          && !availableRemoteStorageInfo.containsKey(storagePath)) {
-        remoteStoragePathCounter.remove(storagePath);
+      } catch (Exception e) {
+        setRemotePathIsHealthy(false);
+        LOG.error("Storage read and write error, we will not use this remote path {}.", uri, e);
+        RankValue rankValue = remoteStoragePathRankValue.get(uri);
+        remoteStoragePathRankValue.put(uri, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      } finally {
+        return sortPathByRankValue(uri, rssTest, remotePathIsHealthy);
       }
+    } else {
+      return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet());
     }
   }
 
+  /**
+   * When choosing the AppBalance strategy, each time you select a path,
+   * you should know the number of the latest apps in different paths
+   */
   @Override
-  public synchronized void removePathFromCounter(String storagePath) {
-    RankValue atomic = remoteStoragePathCounter.get(storagePath);
-    if (atomic != null && atomic.getAppNum().get() == 0) {
-      remoteStoragePathCounter.remove(storagePath);
+  public synchronized RemoteStorageInfo pickStorage(
+      List<Map.Entry<String, RankValue>> uris, String appId) {
+    boolean isUnhealthy =
+        uris.stream().noneMatch(rv -> rv.getValue().getReadAndWriteTime().get() != Long.MAX_VALUE);
+    if (!isUnhealthy) {
+      // If there is only one unhealthy path, then filter that path
+      uris = uris.stream().filter(rv -> rv.getValue().getReadAndWriteTime().get() != Long.MAX_VALUE).sorted(
+          Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
+    } else {
+      // If all paths are unhealthy, assign paths according to the number of apps

Review Comment:
   OK, I will add.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980929018


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   Is the variable necessary to become member field?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1272491063

   https://github.com/apache/incubator-uniffle/actions/runs/3213433147 There is a flay test. Could you help me fix it?
   [test-reports-spark3.2.0 (1).zip](https://github.com/apache/incubator-uniffle/files/9741113/test-reports-spark3.2.0.1.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972594758


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   For `S3` , `COS`, it's OK, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980952434


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;
+
+  public AppBalanceSelectStorageStrategy(
+      Map<String, RankValue> remoteStoragePathRankValue,
+      Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo,
+      Map<String, RemoteStorageInfo> availableRemoteStorageInfo,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    this.appIdToRemoteStorageInfo = appIdToRemoteStorageInfo;
+    this.availableRemoteStorageInfo = availableRemoteStorageInfo;
+    this.hdfsConf = new Configuration();
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+    readAndWriteTimes = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_ACCESS_TIMES);
   }
 
-  /**
-   * the strategy of pick remote storage is according to assignment count
-   */
-  @Override
-  public RemoteStorageInfo pickRemoteStorage(String appId) {
-    if (appIdToRemoteStorageInfo.containsKey(appId)) {
-      return appIdToRemoteStorageInfo.get(appId);
-    }
-
-    // create list for sort
-    List<Map.Entry<String, RankValue>> sizeList =
-        Lists.newArrayList(remoteStoragePathCounter.entrySet()).stream().filter(Objects::nonNull)
-            .sorted(Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
-
-    for (Map.Entry<String, RankValue> entry : sizeList) {
-      String storagePath = entry.getKey();
-      if (availableRemoteStorageInfo.containsKey(storagePath)) {
-        appIdToRemoteStorageInfo.putIfAbsent(appId, availableRemoteStorageInfo.get(storagePath));
-        incRemoteStorageCounter(storagePath);
-        break;
-      }
-    }
-    return appIdToRemoteStorageInfo.get(appId);
-  }
-
-  @Override
   @VisibleForTesting
-  public synchronized void incRemoteStorageCounter(String remoteStoragePath) {
-    RankValue counter = remoteStoragePathCounter.get(remoteStoragePath);
-    if (counter != null) {
-      counter.getAppNum().incrementAndGet();
-    } else {
-      // it may be happened when assignment remote storage
-      // and refresh remote storage at the same time
-      LOG.warn("Remote storage path lost during assignment: {} doesn't exist, reset it to 1",
-          remoteStoragePath);
-      remoteStoragePathCounter.put(remoteStoragePath, new RankValue(1));
+  public List<Map.Entry<String, RankValue>> sortPathByRankValue(
+      String path, String test, boolean isHealthy) {
+    try {
+      FileSystem fs = HadoopFilesystemProvider.getFilesystem(new Path(path), hdfsConf);
+      fs.delete(new Path(test),true);
+      if (isHealthy) {
+        RankValue rankValue = remoteStoragePathRankValue.get(path);
+        remoteStoragePathRankValue.put(path, new RankValue(0, rankValue.getAppNum().get()));
+      }
+    } catch (Exception e) {
+      RankValue rankValue = remoteStoragePathRankValue.get(path);
+      remoteStoragePathRankValue.put(path, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      LOG.error("Failed to sort, we will not use this remote path {}.", path, e);
     }
+    return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet()).stream()
+        .filter(Objects::nonNull).collect(Collectors.toList());
   }
 
   @Override
-  @VisibleForTesting
-  public synchronized void decRemoteStorageCounter(String storagePath) {
-    if (!StringUtils.isEmpty(storagePath)) {
-      RankValue atomic = remoteStoragePathCounter.get(storagePath);
-      if (atomic != null) {
-        double count = atomic.getAppNum().decrementAndGet();
-        if (count < 0) {
-          LOG.warn("Unexpected counter for remote storage: {}, which is {}, reset to 0",
-              storagePath, count);
-          atomic.getAppNum().set(0);
+  public List<Map.Entry<String, RankValue>> detectStorage(String uri) {
+    if (uri.startsWith(ApplicationManager.REMOTE_PATH_SCHEMA.get(0))) {
+      setRemotePathIsHealthy(true);
+      Path remotePath = new Path(uri);
+      String rssTest = uri + "/rssTest";
+      Path testPath = new Path(rssTest);
+      try {
+        FileSystem fs = HadoopFilesystemProvider.getFilesystem(remotePath, hdfsConf);
+        for (int j = 0; j < readAndWriteTimes; j++) {

Review Comment:
   For appBalance, it shouldn't be a configuration option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r977145765


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   I have a little doubt, why do the `readAndWrite` become the only interface method for SelectStorageStrategy? I can rank by many methods, it's not necessary to read and write.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r976098369


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   As I know, the object storage also can be accessed by Hadoop Filesystem Interface (HCFS), like s3/oss/cos. So this is not a problem
   
   But if we want to involve more storages which are not implemented by HCFS or speed up specified api which is only compatible with original storage api, maybe we should implement uniffle's own filesystem, it can refer to Flink.
   
   Please let me if I'm wrong.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1254433428

   > > We should have more general interface. We can only implement part of them.
   > 
   > I think we only need to provide a path that includes the remote, and the rest of the methods can be implemented by judging different storage systems.
   
   My doubt point is that the concept of path is not general one for the storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979591306


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String uri);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  String pickStorage(List<Map.Entry<String, RankValue>> uris, String appId);

Review Comment:
   Should we return `RemoteStorageInfo`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r976735693


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   Thank you very much for your reminder. Although I am not very impressed with this, I have used a similar method to read files in s3. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1252280558

   > For object storage, there may be more S3, but for cos, I don't know this very well. Do you have any good suggestions? @jerqi
   
   Do the S3 have the concept of `path`? It only have the concept of bucket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1252644266

   Could we support anomaly detection of hdfs files first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980855654


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   > This variable is used to determine whether the path is normal, and to pass the parameter into the method `sortPathByRankValue`.
   
   One strategy should have multiple paths, every path is healthy, the variable is true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1272320308

   Could you open this workflow for me? @jerqi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng closed pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng closed pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy
URL: https://github.com/apache/incubator-uniffle/pull/210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972595029


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   At least We need consider the object Storage. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1254453236

   > > So I did not use the concept of `path` for the time being, but used a string to represent.
   > 
   > The concept of subsequent use of buckets or paths can be implemented through the readAndWrite interface.
   
   It's ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng closed pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng closed pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy
URL: https://github.com/apache/incubator-uniffle/pull/210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979174647


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);

Review Comment:
   `path -> uri`?
   Path is not general concept for storage. Maybe we should use uri.



##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> pickStorage(List<Map.Entry<String, RankValue>> pathList);

Review Comment:
   Could we give `pathList` a better name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r976098369


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   As I know, the object storage also can be accessed by Hadoop Filesystem Interface (HCFS), like s3/oss/cos. So this is not a problem
   
   But if we want to involve more storages which are not implemented by HCFS or speed up specified api which is only compatible with original storage api, maybe we should implement uniffle's own filesystem, it can refer to Flink.
   
   Please let me know if I'm wrong.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979240887


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> pickStorage(List<Map.Entry<String, RankValue>> pathList);

Review Comment:
   `uriList` or `uris` may be better?
   Why do the method `pickStorage` return a list?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980921405


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   I know that there are many paths, but this attribute is applied to each path. The healthy path is processed by `sortPathByRankValue` according to `remotePathIsHealthy` as true, and corresponding logic is processed when it is false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1272441432

   PTAL @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r970239515


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,17 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);
 
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  void setFs(FileSystem fs);

Review Comment:
   If we want to support S3 , COS, Redis or some DBs, it's not suitable for us.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r970359524


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,17 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);
 
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  void setFs(FileSystem fs);

Review Comment:
   Maybe I can remove this method, thinking that if I want to support redis or other databases, I need to re implement the `sortPathByRankValue` method. It's a huge job.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1246586850

   Like discussion at dev mail list, we will freeze the code and cut 0.6 version branch in September 15, we  will not merge this pr before I cut 0.6 version branch, are you ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r972139415


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.hadoop.fs.Path;
+
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> sortPathByRankValue(String path, Path testPath, long time, boolean isHealthy);

Review Comment:
   Path seems to be a concept of filesystem. It may be not general.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1256197666

   PTAL @jerqi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979211326


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String path);

Review Comment:
   OK, good suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#issuecomment-1254450854

   So I did not use the concept of `path` for the time being, but used a string to represent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r977256337


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   `readAndWrite` is not a good name. It's strange for the interface `SelectStorageStrategy`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r978333210


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,12 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
-
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  List<Map.Entry<String, RankValue>> readAndWrite(String path);

Review Comment:
   OK, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r990724501


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AbstractSelectStorageStrategy.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.coordinator;
+
+import java.io.IOException;
+import java.util.Map;
+
+import org.apache.commons.lang3.RandomUtils;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+import org.apache.uniffle.common.RemoteStorageInfo;
+import org.apache.uniffle.common.exception.RssException;
+
+import static org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
+
+/**
+ * This is a simple implementation class, which provides some methods to check whether the path is normal
+ */
+public abstract class AbstractSelectStorageStrategy implements SelectStorageStrategy {
+  /**
+   * store remote path -> application count for assignment strategy
+   */
+  protected final Map<String, LowestIOSampleCostSelectStorageStrategy.RankValue> remoteStoragePathRankValue;
+  protected final int fileSize;
+
+  public AbstractSelectStorageStrategy(
+      Map<String, LowestIOSampleCostSelectStorageStrategy.RankValue> remoteStoragePathRankValue,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+  }
+
+  public void readAndWriteHdfsStorage(FileSystem fs, Path testPath,
+      String uri, RankValue rankValue) throws IOException {
+    byte[] data = RandomUtils.nextBytes(fileSize);
+    try (FSDataOutputStream fos = fs.create(testPath)) {
+      fos.write(data);
+      fos.flush();
+    }
+    byte[] readData = new byte[fileSize];
+    int readBytes;
+    try (FSDataInputStream fis = fs.open(testPath)) {
+      int hasReadBytes = 0;
+      do {
+        readBytes = fis.read(readData);
+        if (hasReadBytes < fileSize) {
+          for (int i = 0; i < readBytes; i++) {
+            if (data[hasReadBytes + i] != readData[i]) {
+              remoteStoragePathRankValue.put(uri, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+              throw new RssException("The content of reading and writing is inconsistent.");
+            }
+          }
+        }
+        hasReadBytes += readBytes;
+      } while (readBytes != -1);
+    }
+  }
+
+  @Override

Review Comment:
   If we don't implement the method `detectStorage` and `pickStorage`, we can remove them in the abstract class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980664573


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;
+
+  public AppBalanceSelectStorageStrategy(
+      Map<String, RankValue> remoteStoragePathRankValue,
+      Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo,
+      Map<String, RemoteStorageInfo> availableRemoteStorageInfo,
+      CoordinatorConf conf) {
+    this.remoteStoragePathRankValue = remoteStoragePathRankValue;
+    this.appIdToRemoteStorageInfo = appIdToRemoteStorageInfo;
+    this.availableRemoteStorageInfo = availableRemoteStorageInfo;
+    this.hdfsConf = new Configuration();
+    fileSize = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_FILE_SIZE);
+    readAndWriteTimes = conf.getInteger(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_SCHEDULE_ACCESS_TIMES);
   }
 
-  /**
-   * the strategy of pick remote storage is according to assignment count
-   */
-  @Override
-  public RemoteStorageInfo pickRemoteStorage(String appId) {
-    if (appIdToRemoteStorageInfo.containsKey(appId)) {
-      return appIdToRemoteStorageInfo.get(appId);
-    }
-
-    // create list for sort
-    List<Map.Entry<String, RankValue>> sizeList =
-        Lists.newArrayList(remoteStoragePathCounter.entrySet()).stream().filter(Objects::nonNull)
-            .sorted(Comparator.comparingInt(entry -> entry.getValue().getAppNum().get())).collect(Collectors.toList());
-
-    for (Map.Entry<String, RankValue> entry : sizeList) {
-      String storagePath = entry.getKey();
-      if (availableRemoteStorageInfo.containsKey(storagePath)) {
-        appIdToRemoteStorageInfo.putIfAbsent(appId, availableRemoteStorageInfo.get(storagePath));
-        incRemoteStorageCounter(storagePath);
-        break;
-      }
-    }
-    return appIdToRemoteStorageInfo.get(appId);
-  }
-
-  @Override
   @VisibleForTesting
-  public synchronized void incRemoteStorageCounter(String remoteStoragePath) {
-    RankValue counter = remoteStoragePathCounter.get(remoteStoragePath);
-    if (counter != null) {
-      counter.getAppNum().incrementAndGet();
-    } else {
-      // it may be happened when assignment remote storage
-      // and refresh remote storage at the same time
-      LOG.warn("Remote storage path lost during assignment: {} doesn't exist, reset it to 1",
-          remoteStoragePath);
-      remoteStoragePathCounter.put(remoteStoragePath, new RankValue(1));
+  public List<Map.Entry<String, RankValue>> sortPathByRankValue(
+      String path, String test, boolean isHealthy) {
+    try {
+      FileSystem fs = HadoopFilesystemProvider.getFilesystem(new Path(path), hdfsConf);
+      fs.delete(new Path(test),true);
+      if (isHealthy) {
+        RankValue rankValue = remoteStoragePathRankValue.get(path);
+        remoteStoragePathRankValue.put(path, new RankValue(0, rankValue.getAppNum().get()));
+      }
+    } catch (Exception e) {
+      RankValue rankValue = remoteStoragePathRankValue.get(path);
+      remoteStoragePathRankValue.put(path, new RankValue(Long.MAX_VALUE, rankValue.getAppNum().get()));
+      LOG.error("Failed to sort, we will not use this remote path {}.", path, e);
     }
+    return Lists.newCopyOnWriteArrayList(remoteStoragePathRankValue.entrySet()).stream()
+        .filter(Objects::nonNull).collect(Collectors.toList());
   }
 
   @Override
-  @VisibleForTesting
-  public synchronized void decRemoteStorageCounter(String storagePath) {
-    if (!StringUtils.isEmpty(storagePath)) {
-      RankValue atomic = remoteStoragePathCounter.get(storagePath);
-      if (atomic != null) {
-        double count = atomic.getAppNum().decrementAndGet();
-        if (count < 0) {
-          LOG.warn("Unexpected counter for remote storage: {}, which is {}, reset to 0",
-              storagePath, count);
-          atomic.getAppNum().set(0);
+  public List<Map.Entry<String, RankValue>> detectStorage(String uri) {
+    if (uri.startsWith(ApplicationManager.REMOTE_PATH_SCHEMA.get(0))) {
+      setRemotePathIsHealthy(true);
+      Path remotePath = new Path(uri);
+      String rssTest = uri + "/rssTest";
+      Path testPath = new Path(rssTest);
+      try {
+        FileSystem fs = HadoopFilesystemProvider.getFilesystem(remotePath, hdfsConf);
+        for (int j = 0; j < readAndWriteTimes; j++) {

Review Comment:
   If we just detect the health of storage, it seems unnecessary for us write and read them multiple times.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r980835480


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/AppBalanceSelectStorageStrategy.java:
##########
@@ -39,106 +45,127 @@
 public class AppBalanceSelectStorageStrategy implements SelectStorageStrategy {
 
   private static final Logger LOG = LoggerFactory.getLogger(AppBalanceSelectStorageStrategy.class);
-  /**
-   * store appId -> remote path to make sure all shuffle data of the same application
-   * will be written to the same remote storage
-   */
-  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   /**
    * store remote path -> application count for assignment strategy
    */
-  private final Map<String, RankValue> remoteStoragePathCounter;
+  private final Map<String, RankValue> remoteStoragePathRankValue;
+  private final Map<String, RemoteStorageInfo> appIdToRemoteStorageInfo;
   private final Map<String, RemoteStorageInfo> availableRemoteStorageInfo;
-
-  public AppBalanceSelectStorageStrategy() {
-    this.appIdToRemoteStorageInfo = Maps.newConcurrentMap();
-    this.remoteStoragePathCounter = Maps.newConcurrentMap();
-    this.availableRemoteStorageInfo = Maps.newHashMap();
+  private final Configuration hdfsConf;
+  private final int fileSize;
+  private final int readAndWriteTimes;
+  private boolean remotePathIsHealthy = true;

Review Comment:
   This variable is used to determine whether the path is normal, and to pass the parameter into the method `sortPathByRankValue`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] smallzhongfeng commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
smallzhongfeng commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979725828


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String uri);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  String pickStorage(List<Map.Entry<String, RankValue>> uris, String appId);

Review Comment:
   No need, because the return value of `pickStorage` only needs to be passed to `incRemoteStorageCounter`, `incRemoteStorageCounter` only needs `String` type, and `String` type is more general than `RemoteStorageInfo`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #210: [Improvement] Add hdfs path health check to AppBalanceSelectStorageStrategy

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #210:
URL: https://github.com/apache/incubator-uniffle/pull/210#discussion_r979863331


##########
coordinator/src/main/java/org/apache/uniffle/coordinator/SelectStorageStrategy.java:
##########
@@ -17,24 +17,14 @@
 
 package org.apache.uniffle.coordinator;
 
+import java.util.List;
 import java.util.Map;
 
-import org.apache.uniffle.common.RemoteStorageInfo;
 import org.apache.uniffle.coordinator.LowestIOSampleCostSelectStorageStrategy.RankValue;
 
 public interface SelectStorageStrategy {
 
-  RemoteStorageInfo pickRemoteStorage(String appId);
+  List<Map.Entry<String, RankValue>> detectStorage(String uri);
 
-  void incRemoteStorageCounter(String remoteStoragePath);
-
-  void decRemoteStorageCounter(String storagePath);
-
-  void removePathFromCounter(String storagePath);
-
-  Map<String, RemoteStorageInfo> getAvailableRemoteStorageInfo();
-
-  Map<String, RemoteStorageInfo> getAppIdToRemoteStorageInfo();
-
-  Map<String, RankValue> getRemoteStoragePathRankValue();
+  String pickStorage(List<Map.Entry<String, RankValue>> uris, String appId);

Review Comment:
   > No need, because the return value of `pickStorage` only needs to be passed to `incRemoteStorageCounter`, `incRemoteStorageCounter` only needs `String` type, and `String` type is more general than `RemoteStorageInfo`.
   
   `String` is too general to describe what we want to return. `RemoteStorageInfo` is more accurate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org