You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/08/27 04:45:23 UTC

[GitHub] [incubator-uniffle] leixm opened a new pull request, #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

leixm opened a new pull request, #190:
URL: https://github.com/apache/incubator-uniffle/pull/190

   ###What changes were proposed in this pull request?
   For issue #136  , When we use AQE, we may call shuffleWriteClient.getShuffleResult multiple times. But if both partition 1 and partition 2 are on the server A, we call getShuffleResult(partition 1) to get data form server A, and then we call getShuffleResult(partition 2) to get data form server A, it's not necassray. We can get getShuffleResult(partition 1, partition 2) instead.
   
   ###Why are the changes needed?
   Improve getShuffleResult
   
   ###Does this PR introduce any user-facing change?
   No
   
   ###How was this patch tested?
   Added UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229122875

   @jerqi  can you help review this pr plz?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958369188


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,12 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  repeated int32 partitions = 3;
+}
+

Review Comment:
   Should we add a new message for response?



##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {
+      result.putIfAbsent(partitionId, Roaring64NavigableMap.bitmapOf());
+    }
+    Iterator<Long> it = shuffleBitmap.iterator();
+    long mask = (1L << Constants.PARTITION_ID_MAX_LENGTH) - 1;
+    while (it.hasNext()) {
+      Long blockId = it.next();
+      int partitionId = Math.toIntExact((blockId >> Constants.TASK_ATTEMPT_ID_MAX_LENGTH) & mask);
+      if (partitionId >= startPartition && partitionId <= endPartition) {
+        result.get(partitionId).add(blockId);
+      }
+    }
+    return result;
+  }
+
+  public static Map<ShuffleServerInfo, Set<Integer>> reversePartitionToServers(

Review Comment:
   Could we have a better name?



##########
integration-test/spark3/src/test/java/org/apache/uniffle/test/MockRssShuffleManager.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.test;
+
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.collect.Maps;
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.shuffle.RssShuffleHandle;
+import org.apache.spark.shuffle.RssShuffleManager;
+import org.apache.spark.shuffle.ShuffleHandle;
+import org.apache.spark.shuffle.ShuffleReadMetricsReporter;
+import org.apache.spark.shuffle.ShuffleReader;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+import org.apache.uniffle.common.ShuffleServerInfo;
+
+public class MockRssShuffleManager extends RssShuffleManager {

Review Comment:
   So I think it's not a mock class. Could we give it a better name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r956735607


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,13 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  int32 startPartition = 3;

Review Comment:
   What do you think? @jerqi 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229470687

   I think we should test this with some SQL query in integration test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229141328

   Is it a compatible feature? If this is a breaking change, we should dicuss about it in the mail list first. But it seems to become a compatible feature if you adjust your implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229456508

   @jerqi can you help review this pr plz? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965475449


##########
integration-test/spark3/src/test/java/org/apache/uniffle/test/GetShuffleReportForMultiPartTest.java:
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.test;
+
+import java.io.File;
+import java.nio.file.Files;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.shuffle.RssShuffleHandle;
+import org.apache.spark.shuffle.RssShuffleManager;
+import org.apache.spark.shuffle.RssSparkConfig;
+import org.apache.spark.shuffle.ShuffleHandle;
+import org.apache.spark.shuffle.ShuffleReadMetricsReporter;
+import org.apache.spark.shuffle.ShuffleReader;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec;
+import org.apache.spark.sql.execution.joins.SortMergeJoinExec;
+import org.apache.spark.sql.functions;
+import org.apache.spark.sql.internal.SQLConf;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+import org.apache.uniffle.common.ShuffleServerInfo;
+import org.apache.uniffle.coordinator.CoordinatorConf;
+import org.apache.uniffle.server.MockedGrpcServer;
+import org.apache.uniffle.server.MockedShuffleServerGrpcService;
+import org.apache.uniffle.server.ShuffleServer;
+import org.apache.uniffle.server.ShuffleServerConf;
+import org.apache.uniffle.storage.util.StorageType;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+public class GetShuffleReportForMultiPartTest extends SparkIntegrationTestBase {
+
+  private static final int replicateWrite = 3;
+  private static final int replicateRead = 2;
+
+  @BeforeAll
+  public static void setupServers() throws Exception {
+    CoordinatorConf coordinatorConf = getCoordinatorConf();
+    Map<String, String> dynamicConf = Maps.newHashMap();
+    dynamicConf.put(CoordinatorConf.COORDINATOR_REMOTE_STORAGE_PATH.key(), HDFS_URI + "rss/test");
+    dynamicConf.put(RssSparkConfig.RSS_STORAGE_TYPE.key(), StorageType.MEMORY_LOCALFILE_HDFS.name());
+    addDynamicConf(coordinatorConf, dynamicConf);
+    createCoordinatorServer(coordinatorConf);
+    // Create multi shuffle servers
+    createShuffleServers();
+    startServers();
+  }
+
+  private static void createShuffleServers() throws Exception {
+    for (int i = 0; i < 4; i++) {
+      // Copy from IntegrationTestBase#getShuffleServerConf
+      File dataFolder = Files.createTempDirectory("rssdata" + i).toFile();
+      ShuffleServerConf serverConf = new ShuffleServerConf();
+      dataFolder.deleteOnExit();
+      serverConf.setInteger("rss.rpc.server.port", SHUFFLE_SERVER_PORT + i);
+      serverConf.setString("rss.storage.type", StorageType.MEMORY_LOCALFILE_HDFS.name());
+      serverConf.setString("rss.storage.basePath", dataFolder.getAbsolutePath());
+      serverConf.setString("rss.server.buffer.capacity", "671088640");
+      serverConf.setString("rss.server.memory.shuffle.highWaterMark", "50.0");
+      serverConf.setString("rss.server.memory.shuffle.lowWaterMark", "0.0");
+      serverConf.setString("rss.server.read.buffer.capacity", "335544320");
+      serverConf.setString("rss.coordinator.quorum", COORDINATOR_QUORUM);
+      serverConf.setString("rss.server.heartbeat.delay", "1000");
+      serverConf.setString("rss.server.heartbeat.interval", "1000");
+      serverConf.setInteger("rss.jetty.http.port", 18080 + i);
+      serverConf.setInteger("rss.jetty.corePool.size", 64);
+      serverConf.setInteger("rss.rpc.executor.size", 10);
+      serverConf.setString("rss.server.hadoop.dfs.replication", "2");
+      serverConf.setLong("rss.server.disk.capacity", 10L * 1024L * 1024L * 1024L);
+      serverConf.setBoolean("rss.server.health.check.enable", false);
+      createMockedShuffleServer(serverConf);
+    }
+    enableRecordGetShuffleResult();
+  }
+
+  private static void enableRecordGetShuffleResult() {
+    for (ShuffleServer shuffleServer : shuffleServers) {
+      ((MockedGrpcServer) shuffleServer.getServer()).getService()
+          .enableRecordGetShuffleResult();
+    }
+  }
+
+  @Override
+  public void updateCommonSparkConf(SparkConf sparkConf) {
+    sparkConf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(), "true");
+    sparkConf.set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD().key(), "-1");
+    sparkConf.set(SQLConf.COALESCE_PARTITIONS_MIN_PARTITION_NUM().key(), "1");
+    sparkConf.set(SQLConf.SHUFFLE_PARTITIONS().key(), "100");
+    sparkConf.set(SQLConf.SKEW_JOIN_SKEWED_PARTITION_THRESHOLD().key(), "800");
+    sparkConf.set(SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES().key(), "800");
+  }
+
+  @Override
+  public void updateSparkConfCustomer(SparkConf sparkConf) {
+    sparkConf.set(RssSparkConfig.RSS_STORAGE_TYPE.key(), "HDFS");
+    sparkConf.set(RssSparkConfig.RSS_REMOTE_STORAGE_PATH.key(), HDFS_URI + "rss/test");
+  }
+
+  @Override
+  public void updateSparkConfWithRss(SparkConf sparkConf) {
+    super.updateSparkConfWithRss(sparkConf);
+    // Add multi replica conf
+    sparkConf.set(RssSparkConfig.RSS_DATA_REPLICA.key(), String.valueOf(replicateWrite));
+    sparkConf.set(RssSparkConfig.RSS_DATA_REPLICA_WRITE.key(), String.valueOf(replicateWrite));
+    sparkConf.set(RssSparkConfig.RSS_DATA_REPLICA_READ.key(), String.valueOf(replicateRead));
+
+    sparkConf.set("spark.shuffle.manager",
+        "org.apache.uniffle.test.GetShuffleReportForMultiPartTest$MockRssShuffleManager");
+  }
+
+  @Test
+  public void resultCompareTest() throws Exception {
+    run();
+  }
+
+  @Override
+  Map runTest(SparkSession spark, String fileName) throws Exception {
+    Thread.sleep(4000);
+    Map<Integer, String> map = Maps.newHashMap();
+    Dataset<Row> df2 = spark.range(0, 1000, 1, 10)
+        .select(functions.when(functions.col("id").$less(250), 249)
+            .otherwise(functions.col("id")).as("key2"), functions.col("id").as("value2"));
+    Dataset<Row> df1 = spark.range(0, 1000, 1, 10)
+        .select(functions.when(functions.col("id").$less(250), 249)
+            .when(functions.col("id").$greater(750), 1000)
+                .otherwise(functions.col("id")).as("key1"), functions.col("id").as("value2"));
+    Dataset<Row> df3 = df1.join(df2, df1.col("key1").equalTo(df2.col("key2")));
+
+    List<String> result = Lists.newArrayList();
+    assertTrue(df3.queryExecution().executedPlan().toString().startsWith("AdaptiveSparkPlan isFinalPlan=false"));
+    df3.collectAsList().forEach(row -> {
+      result.add(row.json());
+    });
+    assertTrue(df3.queryExecution().executedPlan().toString().startsWith("AdaptiveSparkPlan isFinalPlan=true"));
+    AdaptiveSparkPlanExec plan = (AdaptiveSparkPlanExec) df3.queryExecution().executedPlan();
+    SortMergeJoinExec joinExec = (SortMergeJoinExec) plan.executedPlan().children().iterator().next();
+    assertTrue(joinExec.isSkewJoin());
+    result.sort(new Comparator<String>() {
+      @Override
+      public int compare(String o1, String o2) {
+        return o1.compareTo(o2);
+      }
+    });
+    int i = 0;
+    for (String str : result) {
+      map.put(i, str);
+      i++;
+    }
+    SparkConf conf = spark.sparkContext().conf();
+    if (!conf.get("spark.shuffle.manager", "").isEmpty()) {
+      MockRssShuffleManager mockRssShuffleManager = (MockRssShuffleManager) spark.sparkContext().env().shuffleManager();
+      int expectRequestNum = mockRssShuffleManager.getShuffleIdToPartitionNum().values().stream()
+          .mapToInt(x -> x.get()).sum();
+      // Validate getShuffleResultForMultiPart is correct before return result
+      validateRequestCount(expectRequestNum * replicateRead);
+    }
+    return map;
+  }
+
+  public void validateRequestCount(int expectRequestNum) {
+    for (ShuffleServer shuffleServer : shuffleServers) {
+      MockedShuffleServerGrpcService service = ((MockedGrpcServer) shuffleServer.getServer()).getService();
+      Map<String, Map<Integer, AtomicInteger>> serviceRequestCount = service.getShuffleIdToPartitionRequest();
+      int requestNum = serviceRequestCount.entrySet().stream().flatMap(x -> x.getValue().values()
+           .stream()).mapToInt(AtomicInteger::get).sum();
+      expectRequestNum -= requestNum;
+    }
+    assertEquals(0, expectRequestNum);
+  }
+
+  public static class MockRssShuffleManager extends RssShuffleManager {

Review Comment:
   This class isn't a mock class. It should be delegate class or wrapper class.



##########
common/src/test/java/org/apache/uniffle/common/util/RssUtilsTest.java:
##########
@@ -190,6 +194,54 @@ public void testLoadExtentions() {
     assertEquals(testStr, extsObjs.get(0).get());
   }
 
+  @Test
+  public void testShuffleBitmapToPartitionBitmap() {
+    Roaring64NavigableMap partition1Bitmap = Roaring64NavigableMap.bitmapOf(
+        getBlockId(0, 0, 0),
+        getBlockId(0, 0, 1),
+        getBlockId(0, 1, 0),
+        getBlockId(0, 1, 1));
+    Roaring64NavigableMap partition2Bitmap = Roaring64NavigableMap.bitmapOf(
+        getBlockId(1, 0, 0),
+        getBlockId(1, 0, 1),
+        getBlockId(1, 1, 0),
+        getBlockId(1, 1, 1));
+    Roaring64NavigableMap shuffleBitmap = Roaring64NavigableMap.bitmapOf();
+    shuffleBitmap.or(partition1Bitmap);
+    shuffleBitmap.or(partition2Bitmap);
+    assertEquals(8, shuffleBitmap.getLongCardinality());
+    Map<Integer, Roaring64NavigableMap> toPartitionBitmap =
+        RssUtils.shuffleBitmapToPartitionBitmap(shuffleBitmap, 0, 1);
+    assertEquals(2, toPartitionBitmap.size());
+    assertEquals(partition1Bitmap, toPartitionBitmap.get(0));
+    assertEquals(partition2Bitmap, toPartitionBitmap.get(1));
+  }
+
+  @Test
+  public void testReversePartitionToServers() {

Review Comment:
   Could we change this test method name?



##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(

Review Comment:
   This comment isn't resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965473203


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {

Review Comment:
   We think we need `<=` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229157254

   Or we should keep the original implementation getShuffleResult(partitionId), and then add a new implementation getShuffleResult(startPartition, endPartition)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
zuston commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229466429

   > @zuston Could you help me solve the problem of the flaky kerberos test?
   
   OK. Let me solve this tomorrow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958397292


##########
server/src/main/java/org/apache/uniffle/server/ShuffleTaskManager.java:
##########
@@ -263,48 +264,52 @@ public long requireBuffer(int requireSize) {
     return requireId;
   }
 
-  public byte[] getFinishedBlockIds(
-      String appId, Integer shuffleId, Integer partitionId) throws IOException {
+  public byte[] getFinishedBlockIds(String appId, Integer shuffleId, Set<Integer> partitions) throws IOException {
     refreshAppId(appId);
-    Storage storage = storageManager.selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId));
-    // update shuffle's timestamp that was recently read.
-    storage.updateReadMetrics(new StorageReadMetrics(appId, shuffleId));
-
+    for (int partitionId : partitions) {
+      Storage storage = storageManager.selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId));
+      // update shuffle's timestamp that was recently read.
+      storage.updateReadMetrics(new StorageReadMetrics(appId, shuffleId));
+    }
     Map<Integer, Roaring64NavigableMap[]> shuffleIdToPartitions = partitionsToBlockIds.get(appId);
     if (shuffleIdToPartitions == null) {
       return null;
     }
+
     Roaring64NavigableMap[] blockIds = shuffleIdToPartitions.get(shuffleId);
     if (blockIds == null) {
       return new byte[]{};
     }
-    Roaring64NavigableMap bitmap = blockIds[partitionId % blockIds.length];
-    if (bitmap == null) {
-      return new byte[]{};
+    Map<Integer, Set<Integer>> bitmapIndexToPartitions = Maps.newHashMap();
+    for (int partitionId : partitions) {
+      int bitmapIndex = partitionId % blockIds.length;
+      if (bitmapIndexToPartitions.containsKey(bitmapIndex)) {
+        bitmapIndexToPartitions.get(bitmapIndex).add(partitionId);
+      } else {
+        HashSet<Integer> newHashSet = Sets.newHashSet(partitionId);
+        bitmapIndexToPartitions.put(bitmapIndex, newHashSet);
+      }
     }
 
-    if (partitionId > Constants.MAX_PARTITION_ID) {
-      throw new RuntimeException("Get invalid partitionId[" + partitionId
-          + "] which greater than " + Constants.MAX_PARTITION_ID);
+    Roaring64NavigableMap res = Roaring64NavigableMap.bitmapOf();
+    for (Map.Entry<Integer, Set<Integer>> entry : bitmapIndexToPartitions.entrySet()) {
+      Set<Integer> requestPartitions = entry.getValue();
+      Roaring64NavigableMap bitmap = blockIds[entry.getKey()];
+      res.or(getBlockIdsByPartitionId(requestPartitions, bitmap));

Review Comment:
   For RoaringBitmap, `or` don't have  higher performance than `add`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1240765560

   Can you help review please? @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r956734957


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,13 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  int32 startPartition = 3;

Review Comment:
   p1、p3 on server1, p2、p4 on server2,will send GetShuffleResultForMultiPart(p1,p4) to server1 and server2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1240179007

   Can you help review please? @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1239348007

   No, this is our production task. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1291906578

   > > ### Environment
   > > Shuffle Server Num : 5 Shuffle Write: 48G Configuration: --conf spark.sql.shuffle.partitions=5000 --conf spark.sql.adaptive.enabled=true --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=64MB
   > > We measure the performance of get_shuffle_result by the following metrics:
   > > 
   > > * get_shuffle_result_times: The number of calls of the get_shuffle_result interface
   > > * get_shuffle_result_cost: Time consumption of get_shuffle_result interface
   > > * get_shuffle_result_for_multi_part_times:The number of calls of the get_shuffle_result_for_multi_part interface
   > > * get_shuffle_result_for_multi_part_cost: Time consumption of get_shuffle_result_for_multi_part interface
   > > 
   > > ### Test Results
   > > Before issue_136
   > > serverId	get_shuffle_result_times	get_shuffle_result_cost(ms)
   > > Server1	1000	157614
   > > Server2	1000	426897
   > > Server3	1000	269488
   > > Server4	1000	906758
   > > Server5	1001	123217
   > > sum	5001	1883974
   > > After issue_136
   > > serverId	get_shuffle_result_for_multi_part_times	get_shuffle_result_for_multi_part_cost(ms)
   > > Server1	833	870720
   > > Server2	833	260865
   > > Server3	834	333202
   > > Server4	833	90277
   > > Server5	835	94113
   > > sum	4168	1649177
   > > ### Summarize
   > > The number of interface requests is reduced by 16%, and the total time is reduced by 12.5%. If we assign consecutive partitions to a server, the improvement will be more obvious.
   > 
   > Could you raise a new issue that Coordinator support to assign consecutive partitions to a server?
   
   No problem, I'll follow up on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231524528

   > If the new version client access the old version server, what will happen? Could we fallback to use old way to access the old version server when there are exceptions?
   
   I think the new version of the client does not need to be compatible with the old version of the server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1239433034

   Please resolve the comments left.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965725306


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,12 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  repeated int32 partitions = 3;
+}
+

Review Comment:
   remind



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965480039


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {

Review Comment:
   > We think we need `<=` here.
   
   Sorry, my mistake. We should `<` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229463191

   @zuston Could you help me solve the problem of  the flaky kerberos test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229156940

   I think it's a compatible feature, could you give me more detail? @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231310852

   > I think we should test this with some SQL query in integration test.
   
   Already added integration test, Can you help review? Thank you @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229456685

   org.apache.uniffle.common.security.HadoopSecurityContextTest test failed cause by Address already in use, It seems to be caused by running multiple tests at the same time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958326099


##########
integration-test/spark3/src/test/java/org/apache/uniffle/test/MockRssShuffleManager.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.test;
+
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.collect.Maps;
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.shuffle.RssShuffleHandle;
+import org.apache.spark.shuffle.RssShuffleManager;
+import org.apache.spark.shuffle.ShuffleHandle;
+import org.apache.spark.shuffle.ShuffleReadMetricsReporter;
+import org.apache.spark.shuffle.ShuffleReader;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+import org.apache.uniffle.common.ShuffleServerInfo;
+
+public class MockRssShuffleManager extends RssShuffleManager {

Review Comment:
   Why do we need this MockRssShuffleManager? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958714791


##########
integration-test/spark3/src/test/java/org/apache/uniffle/test/MockRssShuffleManager.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.test;
+
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.collect.Maps;
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.shuffle.RssShuffleHandle;
+import org.apache.spark.shuffle.RssShuffleManager;
+import org.apache.spark.shuffle.ShuffleHandle;
+import org.apache.spark.shuffle.ShuffleReadMetricsReporter;
+import org.apache.spark.shuffle.ShuffleReader;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+import org.apache.uniffle.common.ShuffleServerInfo;
+
+public class MockRssShuffleManager extends RssShuffleManager {

Review Comment:
   If it is only bound to dedicated test cases, the name should be concrete. Or put this class into its dedicated cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229157998

   > A compatible feature means that old version server can use the new version server, the new version client can use the old version server. You change the protobuf field name, it will cause incompatibility in my thought.
   Thank you for your reminder, i will make it compatible. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231539095

   > > If the new version client access the old version server, what will happen? Could we fallback to use old way to access the old version server when there are exceptions?
   > 
   > I think the new version of the client does not need to be compatible with the old version of the server.
   
   OK, I make a mistake. Maybe we just guarantee  that old client can access the new server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958393406


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(

Review Comment:
   Why do we need `startPartition` and `endPartition`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231614371

   Could you provide some data about benchmark or performance test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958408573


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {

Review Comment:
   Why do we need `<=` instead of `<`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r956734325


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,13 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  int32 startPartition = 3;

Review Comment:
   RssShuffleManager#getReader partition is continuous, GetShuffleResult partition is discontinuous, but it will not cause wrong results, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1239337208

   ### Environment
   Shuffle Server Num : 5
   Shuffle Write: 48G
   Configuration: --conf spark.sql.shuffle.partitions=5000 --conf spark.sql.adaptive.enabled=true --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=64MB
   
   We measure the performance of get_shuffle_result by the following metrics:
   - get_shuffle_result_times: The number of calls of the get_shuffle_result interface
   - get_shuffle_result_cost: Time consumption of get_shuffle_result interface
   - get_shuffle_result_for_multi_part_times:he number of calls of the get_shuffle_result_for_multi_part interface
   - get_shuffle_result_for_multi_part_cost: Time consumption of get_shuffle_result_for_multi_part interface
   ### Test Results
   Before issue_136
   
   | serverId | get_shuffle_result_times | get_shuffle_result_cost(ms) |
   | -------- | ------------------------ | --------------------------- |
   | Server1  | 1000                     | 157614                      |
   | Server2  | 1000                     | 426897                      |
   | Server3  | 1000                     | 269488                      |
   | Server4  | 1000                     | 906758                      |
   | Server5  | 1001                     | 123217                      |
   | sum      | 5001                     | 1883974                     |
   
   
   
   After issue_136
   
   | serverId | get_shuffle_result_for_multi_part_times | get_shuffle_result_for_multi_part_cost(ms) |
   | -------- | --------------------------------------- | ------------------------------------------ |
   | Server1  | 833                                     | 870720                                     |
   | Server2  | 833                                     | 260865                                     |
   | Server3  | 834                                     | 333202                                     |
   | Server4  | 833                                     | 90277                                      |
   | Server5  | 835                                     | 94113                                      |
   | sum      | 4168                                    | 1649177                                    |
   
   ### Summarize
   The number of interface requests is reduced by 16%, and the total time is reduced by 12.5%. If we assign consecutive partitions to a server, the improvement will be more obvious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1291891764

   > ### Environment
   > Shuffle Server Num : 5 Shuffle Write: 48G Configuration: --conf spark.sql.shuffle.partitions=5000 --conf spark.sql.adaptive.enabled=true --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=64MB
   > 
   > We measure the performance of get_shuffle_result by the following metrics:
   > 
   > * get_shuffle_result_times: The number of calls of the get_shuffle_result interface
   > * get_shuffle_result_cost: Time consumption of get_shuffle_result interface
   > * get_shuffle_result_for_multi_part_times:The number of calls of the get_shuffle_result_for_multi_part interface
   > * get_shuffle_result_for_multi_part_cost: Time consumption of get_shuffle_result_for_multi_part interface
   > 
   > ### Test Results
   > Before issue_136
   > 
   > serverId	get_shuffle_result_times	get_shuffle_result_cost(ms)
   > Server1	1000	157614
   > Server2	1000	426897
   > Server3	1000	269488
   > Server4	1000	906758
   > Server5	1001	123217
   > sum	5001	1883974
   > After issue_136
   > 
   > serverId	get_shuffle_result_for_multi_part_times	get_shuffle_result_for_multi_part_cost(ms)
   > Server1	833	870720
   > Server2	833	260865
   > Server3	834	333202
   > Server4	833	90277
   > Server5	835	94113
   > sum	4168	1649177
   > ### Summarize
   > The number of interface requests is reduced by 16%, and the total time is reduced by 12.5%. If we assign consecutive partitions to a server, the improvement will be more obvious.
   
   Could you raise a new issue that Coordinator support to assign consecutive partitions to a server?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229124331

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/190?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#190](https://codecov.io/gh/apache/incubator-uniffle/pull/190?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c9caab2) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/e1b9b80e506eb5079c16cbb71096896f2f647b84?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e1b9b80) will **decrease** coverage by `0.82%`.
   > The diff coverage is `85.29%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #190      +/-   ##
   ============================================
   - Coverage     58.45%   57.63%   -0.83%     
   + Complexity     1272     1207      -65     
   ============================================
     Files           158      149       -9     
     Lines          8437     7959     -478     
     Branches        782      754      -28     
   ============================================
   - Hits           4932     4587     -345     
   + Misses         3254     3128     -126     
   + Partials        251      244       -7     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/190?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...pache/hadoop/mapreduce/task/reduce/RssShuffle.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvbWFwcmVkdWNlL3Rhc2svcmVkdWNlL1Jzc1NodWZmbGUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...he/uniffle/client/impl/ShuffleWriteClientImpl.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL2NsaWVudC9pbXBsL1NodWZmbGVXcml0ZUNsaWVudEltcGwuamF2YQ==) | `25.37% <ø> (ø)` | |
   | [...pache/uniffle/server/ShuffleServerGrpcService.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlU2VydmVyR3JwY1NlcnZpY2UuamF2YQ==) | `0.97% <0.00%> (-0.01%)` | :arrow_down: |
   | [.../java/org/apache/uniffle/common/util/RssUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL2NvbW1vbi91dGlsL1Jzc1V0aWxzLmphdmE=) | `68.62% <100.00%> (+2.43%)` | :arrow_up: |
   | [.../org/apache/uniffle/server/ShuffleTaskManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlVGFza01hbmFnZXIuamF2YQ==) | `77.15% <100.00%> (+12.86%)` | :arrow_up: |
   | [...org/apache/uniffle/server/ShuffleFlushManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlRmx1c2hNYW5hZ2VyLmphdmE=) | `77.65% <0.00%> (-1.68%)` | :arrow_down: |
   | [...che/spark/shuffle/writer/BufferManagerOptions.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQnVmZmVyTWFuYWdlck9wdGlvbnMuamF2YQ==) | | |
   | [...k/shuffle/writer/WrappedByteArrayOutputStream.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JhcHBlZEJ5dGVBcnJheU91dHB1dFN0cmVhbS5qYXZh) | | |
   | [...pache/spark/shuffle/writer/WriteBufferManager.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JpdGVCdWZmZXJNYW5hZ2VyLmphdmE=) | | |
   | [.../org/apache/spark/shuffle/writer/WriterBuffer.java](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JpdGVyQnVmZmVyLmphdmE=) | | |
   | ... and [8 more](https://codecov.io/gh/apache/incubator-uniffle/pull/190/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229157625

   A compatible feature means that old version server can use the new version server, the new version client can use the old version server. You change the protobuf field name, it will cause incompatibility in my thought.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958354811


##########
integration-test/spark3/src/test/java/org/apache/uniffle/test/MockRssShuffleManager.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.test;
+
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.collect.Maps;
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.shuffle.RssShuffleHandle;
+import org.apache.spark.shuffle.RssShuffleManager;
+import org.apache.spark.shuffle.ShuffleHandle;
+import org.apache.spark.shuffle.ShuffleReadMetricsReporter;
+import org.apache.spark.shuffle.ShuffleReader;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+import org.apache.uniffle.common.ShuffleServerInfo;
+
+public class MockRssShuffleManager extends RssShuffleManager {

Review Comment:
   Collect the number of times the partition block needs to be requested through MockRssShuffleManager#getReaderImpl, and then compare it with the appToPartitionRequest of MockedShuffleServerGrpcService to confirm that the number of requests is correct, and there are no redundant requests in the case of multiple replica, See `validateRequestCount(expectRequestNum * replicateRead)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231498373

   If the new version client access the old version server, what will happen? Could we fallback to use old way to access the old version server when there are exceptions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r956736688


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,13 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  int32 startPartition = 3;

Review Comment:
   It seems that in the case of multiple replica, redundant data will be requested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1239344023

   > ### Environment
   > Shuffle Server Num : 5 Shuffle Write: 48G Configuration: --conf spark.sql.shuffle.partitions=5000 --conf spark.sql.adaptive.enabled=true --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=64MB
   > 
   > We measure the performance of get_shuffle_result by the following metrics:
   > 
   > * get_shuffle_result_times: The number of calls of the get_shuffle_result interface
   > * get_shuffle_result_cost: Time consumption of get_shuffle_result interface
   > * get_shuffle_result_for_multi_part_times:The number of calls of the get_shuffle_result_for_multi_part interface
   > * get_shuffle_result_for_multi_part_cost: Time consumption of get_shuffle_result_for_multi_part interface
   > 
   > ### Test Results
   > Before issue_136
   > 
   > serverId	get_shuffle_result_times	get_shuffle_result_cost(ms)
   > Server1	1000	157614
   > Server2	1000	426897
   > Server3	1000	269488
   > Server4	1000	906758
   > Server5	1001	123217
   > sum	5001	1883974
   > After issue_136
   > 
   > serverId	get_shuffle_result_for_multi_part_times	get_shuffle_result_for_multi_part_cost(ms)
   > Server1	833	870720
   > Server2	833	260865
   > Server3	834	333202
   > Server4	833	90277
   > Server5	835	94113
   > sum	4168	1649177
   > ### Summarize
   > The number of interface requests is reduced by 16%, and the total time is reduced by 12.5%. If we assign consecutive partitions to a server, the improvement will be more obvious.
   
   What's your test case? TPC-DS?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958429279


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(

Review Comment:
   Maybe we should have a better name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231723740

   > Could you provide some data about benchmark or performance test?
   
   Ok, I will do some performance tests to try to prove that the performance will improve in some extreme scenarios of AQE


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958347843


##########
integration-test/spark3/src/test/java/org/apache/uniffle/test/MockRssShuffleManager.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.uniffle.test;
+
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.collect.Maps;
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.shuffle.RssShuffleHandle;
+import org.apache.spark.shuffle.RssShuffleManager;
+import org.apache.spark.shuffle.ShuffleHandle;
+import org.apache.spark.shuffle.ShuffleReadMetricsReporter;
+import org.apache.spark.shuffle.ShuffleReader;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+import org.apache.uniffle.common.ShuffleServerInfo;
+
+public class MockRssShuffleManager extends RssShuffleManager {

Review Comment:
   > If the new version client access the old version server, what will happen? Could we fallback to use old way to access the old version server when there are exceptions?
   
   I think the new version of the client does not need to be compatible with the old version of the server.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1231533193

   > > If the new version client access the old version server, what will happen? Could we fallback to use old way to access the old version server when there are exceptions?
   > 
   > I think the new version of the client does not need to be compatible with the old version of the server.
   
   It's difficult to upgrade the server if the client isn't compatible with the old version of the server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958398603


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(

Review Comment:
   Because the partition may not have a block, but we need to fill it to avoid NPE when used, See 
   ```
   for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {
         result.putIfAbsent(partitionId, Roaring64NavigableMap.bitmapOf());
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r956733157


##########
proto/src/main/proto/Rss.proto:
##########
@@ -135,6 +136,13 @@ message GetShuffleResultRequest {
   int32 partitionId = 3;
 }
 
+message GetShuffleResultForMultiPartRequest {
+  string appId = 1;
+  int32 shuffleId = 2;
+  int32 startPartition = 3;

Review Comment:
   Partitions may not be continuous.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965480039


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {

Review Comment:
   > We think we need `<=` here.
   
   Sorry, my mistake. We need `<` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229471929

   You're right , I will add some test in integration test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229158093

   Thank you for your reminder, i will make it compatible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1229735331

   Flaky test was fixed by https://github.com/apache/incubator-uniffle/pull/191 .Thanks @zuston 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958394604


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {
+      result.putIfAbsent(partitionId, Roaring64NavigableMap.bitmapOf());
+    }
+    Iterator<Long> it = shuffleBitmap.iterator();
+    long mask = (1L << Constants.PARTITION_ID_MAX_LENGTH) - 1;
+    while (it.hasNext()) {
+      Long blockId = it.next();
+      int partitionId = Math.toIntExact((blockId >> Constants.TASK_ATTEMPT_ID_MAX_LENGTH) & mask);
+      if (partitionId >= startPartition && partitionId <= endPartition) {
+        result.get(partitionId).add(blockId);
+      }
+    }
+    return result;
+  }
+
+  public static Map<ShuffleServerInfo, Set<Integer>> reversePartitionToServers(

Review Comment:
   do you have a good idea



##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {
+      result.putIfAbsent(partitionId, Roaring64NavigableMap.bitmapOf());
+    }
+    Iterator<Long> it = shuffleBitmap.iterator();
+    long mask = (1L << Constants.PARTITION_ID_MAX_LENGTH) - 1;
+    while (it.hasNext()) {
+      Long blockId = it.next();
+      int partitionId = Math.toIntExact((blockId >> Constants.TASK_ATTEMPT_ID_MAX_LENGTH) & mask);
+      if (partitionId >= startPartition && partitionId <= endPartition) {
+        result.get(partitionId).add(blockId);
+      }
+    }
+    return result;
+  }
+
+  public static Map<ShuffleServerInfo, Set<Integer>> reversePartitionToServers(

Review Comment:
   Do you have a good idea?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958413736


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(
+      Roaring64NavigableMap shuffleBitmap, int startPartition, int endPartition) {
+    Map<Integer, Roaring64NavigableMap> result = Maps.newHashMap();
+    for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {
+      result.putIfAbsent(partitionId, Roaring64NavigableMap.bitmapOf());
+    }
+    Iterator<Long> it = shuffleBitmap.iterator();
+    long mask = (1L << Constants.PARTITION_ID_MAX_LENGTH) - 1;
+    while (it.hasNext()) {
+      Long blockId = it.next();
+      int partitionId = Math.toIntExact((blockId >> Constants.TASK_ATTEMPT_ID_MAX_LENGTH) & mask);
+      if (partitionId >= startPartition && partitionId <= endPartition) {
+        result.get(partitionId).add(blockId);
+      }
+    }
+    return result;
+  }
+
+  public static Map<ShuffleServerInfo, Set<Integer>> reversePartitionToServers(

Review Comment:
   How about `generateServerToPartitions`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r958401878


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(

Review Comment:
   > Because the partition may not have a block, but we need to fill it to avoid NPE when used, See
   > 
   > ```
   > for (int partitionId = startPartition; partitionId <= endPartition; partitionId++) {
   >       result.putIfAbsent(partitionId, Roaring64NavigableMap.bitmapOf());
   >     }
   > ```
   
   A little weird. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi merged PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965569281


##########
server/src/main/java/org/apache/uniffle/server/ShuffleTaskManager.java:
##########
@@ -263,48 +264,52 @@ public long requireBuffer(int requireSize) {
     return requireId;
   }
 
-  public byte[] getFinishedBlockIds(
-      String appId, Integer shuffleId, Integer partitionId) throws IOException {
+  public byte[] getFinishedBlockIds(String appId, Integer shuffleId, Set<Integer> partitions) throws IOException {
     refreshAppId(appId);
-    Storage storage = storageManager.selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId));
-    // update shuffle's timestamp that was recently read.
-    storage.updateReadMetrics(new StorageReadMetrics(appId, shuffleId));
-
+    for (int partitionId : partitions) {
+      Storage storage = storageManager.selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId));
+      // update shuffle's timestamp that was recently read.
+      storage.updateReadMetrics(new StorageReadMetrics(appId, shuffleId));
+    }
     Map<Integer, Roaring64NavigableMap[]> shuffleIdToPartitions = partitionsToBlockIds.get(appId);
     if (shuffleIdToPartitions == null) {
       return null;
     }
+
     Roaring64NavigableMap[] blockIds = shuffleIdToPartitions.get(shuffleId);
     if (blockIds == null) {
       return new byte[]{};
     }
-    Roaring64NavigableMap bitmap = blockIds[partitionId % blockIds.length];
-    if (bitmap == null) {
-      return new byte[]{};
+    Map<Integer, Set<Integer>> bitmapIndexToPartitions = Maps.newHashMap();
+    for (int partitionId : partitions) {
+      int bitmapIndex = partitionId % blockIds.length;
+      if (bitmapIndexToPartitions.containsKey(bitmapIndex)) {
+        bitmapIndexToPartitions.get(bitmapIndex).add(partitionId);
+      } else {
+        HashSet<Integer> newHashSet = Sets.newHashSet(partitionId);
+        bitmapIndexToPartitions.put(bitmapIndex, newHashSet);
+      }
     }
 
-    if (partitionId > Constants.MAX_PARTITION_ID) {
-      throw new RuntimeException("Get invalid partitionId[" + partitionId
-          + "] which greater than " + Constants.MAX_PARTITION_ID);
+    Roaring64NavigableMap res = Roaring64NavigableMap.bitmapOf();
+    for (Map.Entry<Integer, Set<Integer>> entry : bitmapIndexToPartitions.entrySet()) {
+      Set<Integer> requestPartitions = entry.getValue();
+      Roaring64NavigableMap bitmap = blockIds[entry.getKey()];
+      res.or(getBlockIdsByPartitionId(requestPartitions, bitmap));
     }
-
-    return RssUtils.serializeBitMap(getBlockIdsByPartitionId(partitionId, bitmap));
+    return RssUtils.serializeBitMap(res);
   }
 
   // partitionId is passed as long to calculate minValue/maxValue
-  protected Roaring64NavigableMap getBlockIdsByPartitionId(long partitionId, Roaring64NavigableMap bitmap) {
+  protected Roaring64NavigableMap getBlockIdsByPartitionId(Set<Integer> requestPartitions,

Review Comment:
   It seems not effective enough. Could we add the block id to the bitmap directly? Now we add the block id to a bitmap and then we use `or` operation to add them to the another bitmap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#discussion_r965725572


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -292,4 +297,38 @@ public static String getMetricNameForHostName(String hostName) {
     }
     return hostName.replaceAll("[\\.-]", "_");
   }
+
+  public static Map<Integer, Roaring64NavigableMap> shuffleBitmapToPartitionBitmap(

Review Comment:
   remind



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org