You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by "jerqi (via GitHub)" <gi...@apache.org> on 2023/03/09 02:38:46 UTC

[GitHub] [incubator-uniffle] jerqi opened a new pull request, #699: [MINOR] improvement: Reduce the size of Spark patch

jerqi opened a new pull request, #699:
URL: https://github.com/apache/incubator-uniffle/pull/699

   ### What changes were proposed in this pull request?
   As https://github.com/apache/spark/pull/40307#issuecomment-1457619866, we could use `spark.shuffle.reduceLocality.enabled` to reduce the modification of the Apache Spark.
   
   ### Why are the changes needed?
   
   Reduce the spark patch size
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   No need
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130623385


##########
spark-patches/spark-2.4.6_dynamic_allocation_support.patch:
##########
@@ -69,24 +69,3 @@ index 459f575ba7..f563368fa5 100644
        for ((tid, info) <- taskInfos if info.executorId == execId) {
          val index = taskInfos(tid).index
          // We may have a running task whose partition has been marked as successful,
-diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala

Review Comment:
   I see, sounds reasonable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130343659


##########
client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -188,6 +188,8 @@ public RssShuffleManager(SparkConf sparkConf, boolean isDriver) {
     // External shuffle service is not supported when using remote shuffle service
     sparkConf.set("spark.shuffle.service.enabled", "false");
     LOG.info("Disable external shuffle service in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   Because our data is in the RSS cluster or HDFS, locatity is useless for us. If rack aware need this feature, we can turn on this feature when we use rack aware.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#issuecomment-1461186652

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#699](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bf81783) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/ffa50b979ea6ae11b423646cc0c2381af29e4cf8?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ffa50b9) will **increase** coverage by `1.06%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #699      +/-   ##
   ============================================
   + Coverage     60.80%   61.86%   +1.06%     
   + Complexity     1840     1744      -96     
   ============================================
     Files           221      198      -23     
     Lines         12648    10114    -2534     
     Branches       1062     1019      -43     
   ============================================
   - Hits           7690     6257    -1433     
   + Misses         4552     3525    -1027     
   + Partials        406      332      -74     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...storage/handler/impl/DataSkippableReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9EYXRhU2tpcHBhYmxlUmVhZEhhbmRsZXIuamF2YQ==) | `83.78% <0.00%> (-2.71%)` | :arrow_down: |
   | [.../java/org/apache/uniffle/server/ShuffleServer.java](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2VydmVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL3NlcnZlci9TaHVmZmxlU2VydmVyLmphdmE=) | `59.60% <0.00%> (-2.47%)` | :arrow_down: |
   | [...le/storage/common/DefaultStorageMediaProvider.java](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2NvbW1vbi9EZWZhdWx0U3RvcmFnZU1lZGlhUHJvdmlkZXIuamF2YQ==) | `60.60% <0.00%> (ø)` | |
   | [...rnetes/operator/pkg/webhook/inspector/inspector.go](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGVwbG95L2t1YmVybmV0ZXMvb3BlcmF0b3IvcGtnL3dlYmhvb2svaW5zcGVjdG9yL2luc3BlY3Rvci5nbw==) | | |
   | [...eploy/kubernetes/operator/pkg/utils/coordinator.go](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGVwbG95L2t1YmVybmV0ZXMvb3BlcmF0b3IvcGtnL3V0aWxzL2Nvb3JkaW5hdG9yLmdv) | | |
   | [deploy/kubernetes/operator/pkg/utils/config.go](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGVwbG95L2t1YmVybmV0ZXMvb3BlcmF0b3IvcGtnL3V0aWxzL2NvbmZpZy5nbw==) | | |
   | [deploy/kubernetes/operator/pkg/webhook/manager.go](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGVwbG95L2t1YmVybmV0ZXMvb3BlcmF0b3IvcGtnL3dlYmhvb2svbWFuYWdlci5nbw==) | | |
   | [deploy/kubernetes/operator/pkg/utils/util.go](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGVwbG95L2t1YmVybmV0ZXMvb3BlcmF0b3IvcGtnL3V0aWxzL3V0aWwuZ28=) | | |
   | [...oy/kubernetes/operator/pkg/controller/util/util.go](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGVwbG95L2t1YmVybmV0ZXMvb3BlcmF0b3IvcGtnL2NvbnRyb2xsZXIvdXRpbC91dGlsLmdv) | | |
   | [.../java/org/apache/spark/shuffle/RssSparkConfig.java](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTcGFya0NvbmZpZy5qYXZh) | | |
   | ... and [16 more](https://codecov.io/gh/apache/incubator-uniffle/pull/699?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130670371


##########
client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -188,6 +188,8 @@ public RssShuffleManager(SparkConf sparkConf, boolean isDriver) {
     // External shuffle service is not supported when using remote shuffle service
     sparkConf.set("spark.shuffle.service.enabled", "false");
     LOG.info("Disable external shuffle service in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130669705


##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -205,6 +205,8 @@ public RssShuffleManager(SparkConf conf, boolean isDriver) {
     LOG.info("Disable external shuffle service in RssShuffleManager.");
     sparkConf.set("spark.sql.adaptive.localShuffleReader.enabled", "false");
     LOG.info("Disable local shuffle reader in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130759062


##########
client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -188,6 +188,8 @@ public RssShuffleManager(SparkConf sparkConf, boolean isDriver) {
     // External shuffle service is not supported when using remote shuffle service
     sparkConf.set("spark.shuffle.service.enabled", "false");
     LOG.info("Disable external shuffle service in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   +1. The locality awareness could be supported in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130625128


##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -205,6 +205,8 @@ public RssShuffleManager(SparkConf conf, boolean isDriver) {
     LOG.info("Disable external shuffle service in RssShuffleManager.");
     sparkConf.set("spark.sql.adaptive.localShuffleReader.enabled", "false");
     LOG.info("Disable local shuffle reader in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130538089


##########
spark-patches/spark-2.4.6_dynamic_allocation_support.patch:
##########
@@ -69,24 +69,3 @@ index 459f575ba7..f563368fa5 100644
        for ((tid, info) <- taskInfos if info.executorId == execId) {
          val index = taskInfos(tid).index
          // We may have a running task whose partition has been marked as successful,
-diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala

Review Comment:
   Spark 3.1 and 3.2 still need this patch. Because this bug https://github.com/apache/spark/pull/40339 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi merged PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130334851


##########
client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -188,6 +188,8 @@ public RssShuffleManager(SparkConf sparkConf, boolean isDriver) {
     // External shuffle service is not supported when using remote shuffle service
     sparkConf.set("spark.shuffle.service.enabled", "false");
     LOG.info("Disable external shuffle service in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   #341 need this feature.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130624918


##########
client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -188,6 +188,8 @@ public RssShuffleManager(SparkConf sparkConf, boolean isDriver) {
     // External shuffle service is not supported when using remote shuffle service
     sparkConf.set("spark.shuffle.service.enabled", "false");
     LOG.info("Disable external shuffle service in RssShuffleManager.");
+    sparkConf.set("spark.shuffle.reduceLocality.enabled", "false");

Review Comment:
   Would you mind add a comment here, like L188.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #699: [MINOR] improvement: Reduce the size of Spark patch

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #699:
URL: https://github.com/apache/incubator-uniffle/pull/699#discussion_r1130533540


##########
spark-patches/spark-2.4.6_dynamic_allocation_support.patch:
##########
@@ -69,24 +69,3 @@ index 459f575ba7..f563368fa5 100644
        for ((tid, info) <- taskInfos if info.executorId == execId) {
          val index = taskInfos(tid).index
          // We may have a running task whose partition has been marked as successful,
-diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala

Review Comment:
   what about spark-3.x patches?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org