You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/09 10:15:53 UTC

[GitHub] [spark] SparksFyz edited a comment on pull request #32198: [SPARK-26164][SQL] Allow concurrent writers for writing dynamic partitions and bucket table

SparksFyz edited a comment on pull request #32198:
URL: https://github.com/apache/spark/pull/32198#issuecomment-1062764141


   > > One more thing, how much does this improve the write? Local sorts before the write are typically not too bad if you look at the cycles spend during the write. A much bigger target here would be to properly interleave I/O and CPU operations. You sort of achieve that by having multiple writers, but it IMO feels like quite a big hammer.
   > 
   > I will add a benchmark for this as a followup.
   > 
   > IMHO how much this can improve thing is really depending on query shape (cardinality of dynamic partitions and buckets). In one environment, if most queries having low number of partitions and users set buckets relatively small, this feature can help more. If in another environment, query tends to write a lot of partitions and users set buckets quite large, this feature helps less. We do see benefit for improving query internally and people raised the request in spark dev as well.
   
   @c21 Hi~ Any link for benchmark? Thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org