You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/05 09:53:55 UTC

[GitHub] [spark] lucasbru opened a new pull request #30262: [SPARK-33356] Remove eliminate mutual recursion in PartitionerAwareUnion

lucasbru opened a new pull request #30262:
URL: https://github.com/apache/spark/pull/30262


   The current implementation of the `DAGScheduler` exhibits exponential runtime in DAGs with many `PartitionerAwareUnions`. The reason seems to be a mutual recursion between
   `PartitionerAwareUnion.getPreferredLocations` and `DAGScheduler.getPreferredLocs`.
   
   A minimal example reproducing the issue:
   
   ```
   object Example extends App {
     val partitioner = new HashPartitioner(2)
     val sc = new SparkContext(new SparkConf().setAppName("").setMaster("local[*]"))
     val rdd1 = sc.emptyRDD[(Int, Int)].partitionBy(partitioner)
     val rdd2 = (1 to 30).map(_ => rdd1)
     val rdd3 = rdd2.reduce(_ union _)
     rdd3.collect()
   }
   ```
   
   The whole app should take around one second to complete, as no actual work is done. However,  it takes more time to submit the job than I am willing to wait. 
   
   The underlying cause appears to be mutual recursion between `PartitionerAwareUnion.getPreferredLocations` and `DAGScheduler.getPreferredLocs`, which restarts graph traversal at each `PartitionerAwareUnion` with no memoization. Each node of the DAG is visited `O(n!)` (exponentially many) times.
   
   In the pull-request I propose to call rdd.getPreferredLocations instead of DAGScheduler.getPreferredLocs inside PartitionerAwareUnion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #30262: [SPARK-33356] Remove eliminate mutual recursion in PartitionerAwareUnion

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #30262:
URL: https://github.com/apache/spark/pull/30262#issuecomment-778700677


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30262: [SPARK-33356] Remove eliminate mutual recursion in PartitionerAwareUnion

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30262:
URL: https://github.com/apache/spark/pull/30262#issuecomment-722271763


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #30262: [SPARK-33356] Remove eliminate mutual recursion in PartitionerAwareUnion

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #30262:
URL: https://github.com/apache/spark/pull/30262


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30262: [SPARK-33356] Remove eliminate mutual recursion in PartitionerAwareUnion

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30262:
URL: https://github.com/apache/spark/pull/30262#issuecomment-722271763


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30262: [SPARK-33356] Remove eliminate mutual recursion in PartitionerAwareUnion

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30262:
URL: https://github.com/apache/spark/pull/30262#issuecomment-722272324


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org