You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "faucct (via GitHub)" <gi...@apache.org> on 2023/10/25 08:28:59 UTC

[PR] ExternalSorter#mergeSort complexity should be linear if data is already sorted [spark]

faucct opened a new pull request, #43525:
URL: https://github.com/apache/spark/pull/43525

   Right now if the data is already sorted you are reading from partitions one by one, but the iterator travels the PriorityQueue up and down for each record.
   The perfect solution would be to use `scala.collection.mutable.PriorityQueue#fixDown` after `.next()` instead of enqueue+deque, but its access is forbidden.
   The behaviour should not change. I did not benchmark it, but I think this should improve the performance in average.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] ExternalSorter#mergeSort complexity should be linear if data is already sorted [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #43525:
URL: https://github.com/apache/spark/pull/43525#issuecomment-1924932139

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] ExternalSorter#mergeSort complexity should be linear if data is already sorted [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #43525: ExternalSorter#mergeSort complexity should be linear if data is already sorted
URL: https://github.com/apache/spark/pull/43525


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org