You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@uniffle.apache.org by ck...@apache.org on 2023/01/17 02:55:26 UTC

[incubator-uniffle] branch branch-0.6 updated (4b3d37db -> c4259be9)

This is an automated email from the ASF dual-hosted git repository.

ckj pushed a change to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git


    from 4b3d37db Bump project version to 0.6.2-SNAPSHOT
     new 790ea613 Fix incorrect metrics of event_queue_size and total_write_handler (#411)
     new d4532f45 [Deps] Bump slf4j to fix vulnerability in slf4j-log4j12 (#464)
     new 9f3d0743 Fix potenial race condition when registering remote storage info (#481)
     new c4259be9 Improve README (#427)

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 README.md                                          | 48 +++++++++++-----------
 pom.xml                                            |  2 +-
 .../apache/uniffle/server/ShuffleFlushManager.java |  7 +++-
 .../uniffle/server/storage/HdfsStorageManager.java | 17 ++++----
 4 files changed, 38 insertions(+), 36 deletions(-)

[incubator-uniffle] 02/04: [Deps] Bump slf4j to fix vulnerability in slf4j-log4j12 (#464)

Posted by ck...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

ckj pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git

commit d4532f458bcd2d25e7e0662cab59e3fd811ef62e
Author: Kaijie Chen <ck...@apache.org>
AuthorDate: Wed Jan 11 10:07:15 2023 +0800

    [Deps] Bump slf4j to fix vulnerability in slf4j-log4j12 (#464)
    
    ### What changes were proposed in this pull request?
    
    Bump slf4j to 1.7.36 to fix vulnerability in slf4j-log4j12.
    
    Btw, slf4j:1.7.36 depends on reload4j:1.2.19 instead of log4j.
    
    ### Why are the changes needed?
    
    slf4j-log4j12:1.7.25 provides transitive vulnerable dependency log4j:1.2.17
    
    * CVE-2019-17571 9.8 Deserialization of Untrusted Data vulnerability pending CVSS allocation
    * CVE-2021-4104 7.5 Deserialization of Untrusted Data vulnerability with medium severity found
    * CVE-2022-23302 8.8 Deserialization of Untrusted Data vulnerability pending CVSS allocation
    * CVE-2022-23305 9.8 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') vulnerability pending CVSS allocation
    * CVE-2022-23307 8.8 Deserialization of Untrusted Data vulnerability pending CVSS allocation
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    No need.
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index e05da796..718af206 100644
--- a/pom.xml
+++ b/pom.xml
@@ -71,7 +71,7 @@
     <roaring.bitmap.version>0.9.15</roaring.bitmap.version>
     <rss.shade.packageName>org.apache.uniffle</rss.shade.packageName>
     <skipDeploy>false</skipDeploy>
-    <slf4j.version>1.7.25</slf4j.version>
+    <slf4j.version>1.7.36</slf4j.version>
     <spotbugs.version>4.7.0</spotbugs.version>
     <spotbugs-maven-plugin.version>4.7.0.0</spotbugs-maven-plugin.version>
     <system-rules.version>1.19.0</system-rules.version>

[incubator-uniffle] 04/04: Improve README (#427)

Posted by ck...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

ckj pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git

commit c4259be963a019559a4908bfbcf166cdb89699e7
Author: Kaijie Chen <ck...@apache.org>
AuthorDate: Thu Dec 15 20:23:21 2022 +0800

    Improve README (#427)
    
    1. Update the introduction section in README:
        * "compute engines" is changed to "distributed compute engines".
        * "MapReduce" is changed to "Apache Hadoop MapReduce".
        * Remove the "Total lines" badge, see [OSSInsight][1] for better insight.
    2. Fix typos.
    
    [1]: https://ossinsight.io/analyze/apache/incubator-uniffle#lines-of-code-changed "apache/incubator-uniffle | OSSInsight"
    
    Improve the README.
    
    No.
    
    Preview: https://github.com/kaijchen/incubator-uniffle/tree/readme#apache-uniffle-incubating
---
 README.md | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/README.md b/README.md
index 84fd8099..c6f2a77b 100644
--- a/README.md
+++ b/README.md
@@ -17,23 +17,25 @@
 
 # Apache Uniffle (Incubating)
 
-Uniffle is a unified remote shuffle service for compute engines.
+Uniffle is a unified remote shuffle service for distributed compute engines.
 It provides the ability to aggregate and store shuffle data on remote servers,
 thus improving the performance and reliability of large jobs.
-Currently it supports [Apache Spark](https://spark.apache.org) and [MapReduce](https://hadoop.apache.org).
+Currently it supports [Apache Spark](https://spark.apache.org) and [Apache Hadoop MapReduce](https://hadoop.apache.org).
 
 [![Build](https://github.com/apache/incubator-uniffle/actions/workflows/build.yml/badge.svg?branch=master&event=push)](https://github.com/apache/incubator-uniffle/actions/workflows/build.yml)
 [![Codecov](https://codecov.io/gh/apache/incubator-uniffle/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/incubator-uniffle)
+[![License](https://img.shields.io/github/license/apache/incubator-uniffle)](https://github.com/apache/incubator-uniffle/blob/master/LICENSE)
+[![Release](https://img.shields.io/github/v/release/apache/incubator-uniffle)](https://github.com/apache/incubator-uniffle/releases)
 
 ## Architecture
 ![Rss Architecture](docs/asset/rss_architecture.png)
-Uniffle contains coordinator cluster, shuffle server cluster and remote storage(eg, HDFS) if necessary.
+Uniffle cluster consists of three components, a coordinator cluster, a shuffle server cluster and an optional remote storage (e.g., HDFS).
 
-Coordinator will collect status of shuffle server and do the assignment for the job.
+Coordinator will collect the status of shuffle servers and assign jobs based on some strategy.
 
 Shuffle server will receive the shuffle data, merge them and write to storage.
 
-Depend on different situation, Uniffle supports Memory & Local, Memory & Remote Storage(eg, HDFS), Memory & Local & Remote Storage(recommendation for production environment).
+Depending on different situations, Uniffle supports Memory & Local, Memory & Remote Storage(e.g., HDFS), Memory & Local & Remote Storage(recommendation for production environment).
 
 ## Shuffle Process with Uniffle
 
@@ -50,20 +52,20 @@ Depend on different situation, Uniffle supports Memory & Local, Memory & Remote
    8. After write data, task report all blockId to shuffle server, this step is used for data validation later
    9. Store taskAttemptId in MapStatus to support Spark speculation
 
-* Depend on different storage type, spark task read shuffle data from shuffle server or remote storage or both of them.
+* Depending on different storage types, the spark task will read shuffle data from shuffle server or remote storage or both of them.
 
 ## Shuffle file format
-The shuffle data is stored with index file and data file. Data file has all blocks for specific partition and index file has metadata for every block.
+The shuffle data is stored with index file and data file. Data file has all blocks for a specific partition and the index file has metadata for every block.
 
 ![Rss Shuffle_Write](docs/asset/rss_data_format.png)
 
 ## Supported Spark Version
-Current support Spark 2.3.x, Spark 2.4.x, Spark3.0.x, Spark 3.1.x, Spark 3.2.x
+Currently supports Spark 2.3.x, Spark 2.4.x, Spark 3.0.x, Spark 3.1.x, Spark 3.2.x
 
 Note: To support dynamic allocation, the patch(which is included in client-spark/patch folder) should be applied to Spark
 
 ## Supported MapReduce Version
-Current support Hadoop 2.8.5's MapReduce framework.
+Currently supports the MapReduce framework of Hadoop 2.8.5
 
 ## Building Uniffle
 > note: currently Uniffle requires JDK 1.8 to build, adding later JDK support is on our roadmap.
@@ -73,11 +75,11 @@ To build it, run:
 
     mvn -DskipTests clean package
 
-Build against profile Spark2(2.4.6)
+Build against profile Spark 2 (2.4.6)
 
     mvn -DskipTests clean package -Pspark2
 
-Build against profile Spark3(3.1.2)
+Build against profile Spark 3 (3.1.2)
 
     mvn -DskipTests clean package -Pspark3
 
@@ -108,13 +110,13 @@ rss-xxx.tgz will be generated for deployment
 ### Deploy Coordinator
 
 1. unzip package to RSS_HOME
-2. update RSS_HOME/bin/rss-env.sh, eg,
+2. update RSS_HOME/bin/rss-env.sh, e.g.,
    ```
      JAVA_HOME=<java_home>
      HADOOP_HOME=<hadoop home>
      XMX_SIZE="16g"
    ```
-3. update RSS_HOME/conf/coordinator.conf, eg,
+3. update RSS_HOME/conf/coordinator.conf, e.g.,
    ```
      rss.rpc.server.port 19999
      rss.jetty.http.port 19998
@@ -128,9 +130,9 @@ rss-xxx.tgz will be generated for deployment
      # config the path of excluded shuffle server
      rss.coordinator.exclude.nodes.file.path <RSS_HOME>/conf/exclude_nodes
    ```
-4. update <RSS_HOME>/conf/dynamic_client.conf, rss client will get default conf from coordinator eg,
+4. update <RSS_HOME>/conf/dynamic_client.conf, rss client will get default conf from coordinator e.g.,
    ```
-    # MEMORY_LOCALFILE_HDFS is recommandation for production environment
+    # MEMORY_LOCALFILE_HDFS is recommended for production environment
     rss.storage.type MEMORY_LOCALFILE_HDFS
     # multiple remote storages are supported, and client will get assignment from coordinator
     rss.coordinator.remote.storage.path hdfs://cluster1/path,hdfs://cluster2/path
@@ -147,18 +149,18 @@ rss-xxx.tgz will be generated for deployment
 ### Deploy Shuffle Server
 
 1. unzip package to RSS_HOME
-2. update RSS_HOME/bin/rss-env.sh, eg,
+2. update RSS_HOME/bin/rss-env.sh, e.g.,
    ```
      JAVA_HOME=<java_home>
      HADOOP_HOME=<hadoop home>
      XMX_SIZE="80g"
    ```
-3. update RSS_HOME/conf/server.conf, eg,
+3. update RSS_HOME/conf/server.conf, e.g.,
    ```
      rss.rpc.server.port 19999
      rss.jetty.http.port 19998
      rss.rpc.executor.size 2000
-     # it should be configed the same as in coordinator
+     # it should be configured the same as in coordinator
      rss.storage.type MEMORY_LOCALFILE_HDFS
      rss.coordinator.quorum <coordinatorIp1>:19999,<coordinatorIp2>:19999
      # local storage path for shuffle server
@@ -176,7 +178,7 @@ rss-xxx.tgz will be generated for deployment
      rss.server.app.expired.withoutHeartbeat 120000
      # note: the default value of rss.server.flush.cold.storage.threshold.size is 64m
      # there will be no data written to DFS if set it as 100g even rss.storage.type=MEMORY_LOCALFILE_HDFS
-     # please set proper value if DFS is used, eg, 64m, 128m.
+     # please set a proper value if DFS is used, e.g., 64m, 128m.
      rss.server.flush.cold.storage.threshold.size 100g
    ```
 4. start Shuffle Server
@@ -185,13 +187,13 @@ rss-xxx.tgz will be generated for deployment
    ```
 
 ### Deploy Spark Client
-1. Add client jar to Spark classpath, eg, SPARK_HOME/jars/
+1. Add client jar to Spark classpath, e.g., SPARK_HOME/jars/
 
    The jar for Spark2 is located in <RSS_HOME>/jars/client/spark2/rss-client-XXXXX-shaded.jar
 
    The jar for Spark3 is located in <RSS_HOME>/jars/client/spark3/rss-client-XXXXX-shaded.jar
 
-2. Update Spark conf to enable Uniffle, eg,
+2. Update Spark conf to enable Uniffle, e.g.,
 
    ```
    spark.shuffle.manager org.apache.spark.shuffle.RssShuffleManager
@@ -216,7 +218,7 @@ After apply the patch and rebuild spark, add following configuration in spark co
 
 The jar for MapReduce is located in <RSS_HOME>/jars/client/mr/rss-client-mr-XXXXX-shaded.jar
 
-2. Update MapReduce conf to enable Uniffle, eg,
+2. Update MapReduce conf to enable Uniffle, e.g.,
 
    ```
    -Dmapreduce.rss.coordinator.quorum=<coordinatorIp1>:19999,<coordinatorIp2>:19999
@@ -230,7 +232,7 @@ and job recovery (i.e., `yarn.app.mapreduce.am.job.recovery.enable=false`)
 
 ## Configuration
 
-The important configuration is listed as following.
+The important configuration is listed as follows.
 
 ### Coordinator

[incubator-uniffle] 01/04: Fix incorrect metrics of event_queue_size and total_write_handler (#411)

Posted by ck...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

ckj pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git

commit 790ea613284971503a38fbfd57d56f9c9768d838
Author: Junfan Zhang <zu...@apache.org>
AuthorDate: Tue Dec 13 17:45:54 2022 +0800

    Fix incorrect metrics of event_queue_size and total_write_handler (#411)
    
    ### What changes were proposed in this pull request?
    
    Fix incorrect metrics of event_queue_size and total_write_handler
    
    ### Why are the changes needed?
    In current codebase, there are bugs on above metrics.
    
    1. The metric of total_write_handler won't desc when exception happened on flushing to file
    2. The metric of event_queue_size won't show the correct wait queue size. In original logic, if all events are waiting to be operated in flush thread pool, the flushQueue is always 0 and the metric value also will be 0. This is wrong.
    
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    
    ### How was this patch tested?
    Don't need.
---
 .../main/java/org/apache/uniffle/server/ShuffleFlushManager.java   | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java b/server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java
index 96f89296..619fe9ca 100644
--- a/server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java
+++ b/server/src/main/java/org/apache/uniffle/server/ShuffleFlushManager.java
@@ -95,12 +95,13 @@ public class ShuffleFlushManager {
           ShuffleDataFlushEvent event = flushQueue.take();
           threadPoolExecutor.execute(() -> {
             try {
-              ShuffleServerMetrics.gaugeEventQueueSize.set(flushQueue.size());
               ShuffleServerMetrics.gaugeWriteHandler.inc();
               flushToFile(event);
-              ShuffleServerMetrics.gaugeWriteHandler.dec();
             } catch (Exception e) {
               LOG.error("Exception happened when flush data for " + event, e);
+            } finally {
+              ShuffleServerMetrics.gaugeWriteHandler.dec();
+              ShuffleServerMetrics.gaugeEventQueueSize.dec();
             }
           });
         } catch (Exception e) {
@@ -137,6 +138,8 @@ public class ShuffleFlushManager {
   public void addToFlushQueue(ShuffleDataFlushEvent event) {
     if (!flushQueue.offer(event)) {
       LOG.warn("Flush queue is full, discard event: " + event);
+    } else {
+      ShuffleServerMetrics.gaugeEventQueueSize.inc();
     }
   }

[incubator-uniffle] 03/04: Fix potenial race condition when registering remote storage info (#481)

Posted by ck...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

ckj pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git

commit 9f3d0743363cd2d2faf414ac3a389328c2a27168
Author: Junfan Zhang <zu...@apache.org>
AuthorDate: Mon Jan 16 13:56:37 2023 +0800

    Fix potenial race condition when registering remote storage info (#481)
    
    
    ### What changes were proposed in this pull request?
    
    1. Use the concurrentHashMap's computeIfAbsent to keep thread safe
    
    ### Why are the changes needed?
    1. To fix potential race condition when registering remote storage info
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Existing UTs
---
 .../uniffle/server/storage/HdfsStorageManager.java      | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/server/src/main/java/org/apache/uniffle/server/storage/HdfsStorageManager.java b/server/src/main/java/org/apache/uniffle/server/storage/HdfsStorageManager.java
index 7c0c0dd3..72e622ce 100644
--- a/server/src/main/java/org/apache/uniffle/server/storage/HdfsStorageManager.java
+++ b/server/src/main/java/org/apache/uniffle/server/storage/HdfsStorageManager.java
@@ -95,23 +95,20 @@ public class HdfsStorageManager extends SingleStorageManager {
   @Override
   public void registerRemoteStorage(String appId, RemoteStorageInfo remoteStorageInfo) {
     String remoteStorage = remoteStorageInfo.getPath();
-    Map<String, String> remoteStorageConf = remoteStorageInfo.getConfItems();
-    if (!pathToStorages.containsKey(remoteStorage)) {
+    pathToStorages.computeIfAbsent(remoteStorage, key -> {
+      Map<String, String> remoteStorageConf = remoteStorageInfo.getConfItems();
       Configuration remoteStorageHadoopConf = new Configuration(hadoopConf);
       if (remoteStorageConf != null && remoteStorageConf.size() > 0) {
         for (Map.Entry<String, String> entry : remoteStorageConf.entrySet()) {
           remoteStorageHadoopConf.setStrings(entry.getKey(), entry.getValue());
         }
       }
-      pathToStorages.putIfAbsent(remoteStorage, new HdfsStorage(remoteStorage, remoteStorageHadoopConf));
-      // registerRemoteStorage may be called in different threads,
-      // make sure metrics won't be created duplicated
-      // there shouldn't have performance issue because
-      // it will be called only few times according to the number of remote storage
-      String storageHost = pathToStorages.get(remoteStorage).getStorageHost();
+      HdfsStorage hdfsStorage = new HdfsStorage(remoteStorage, remoteStorageHadoopConf);
+      String storageHost = hdfsStorage.getStorageHost();
       ShuffleServerMetrics.addDynamicCounterForRemoteStorage(storageHost);
-    }
-    appIdToStorages.putIfAbsent(appId, pathToStorages.get(remoteStorage));
+      return hdfsStorage;
+    });
+    appIdToStorages.computeIfAbsent(appId, key -> pathToStorages.get(remoteStorage));
   }
 
   public HdfsStorage getStorageByAppId(String appId) {