You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/13 00:30:45 UTC

[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

c21 commented on pull request #29079:
URL: https://github.com/apache/spark/pull/29079#issuecomment-657300530

> We build hash map for each bucket on other side and it also sounds to OOM easily. This feature is disabled by a config by default, so it may be okay. But we should be careful not to enable it by default later.

@viirya, thanks for comment. I agree this feature should be selectively enabled, but sorry I don't see OOM has anything to do with this feature.

You are saying OOM is an issue for shuffled hash join on bucketed table, which I agree. This feature is coalescing on stream side (not touch build side at all), so I don't think it's adding any more risk for OOM on build side. As [sort merge join is by default preferred over shuffled hash join](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L335), so when users enable shuffled hash join by config explicitly, they should already pay attention to OOM problem.

Am I miss anything? Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org