You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/02 04:58:18 UTC

[GitHub] [spark] pan3793 commented on pull request #35696: [SPARK-38361][SQL] Factory method getConnection should take Partition as optional parameter.

pan3793 commented on pull request #35696:
URL: https://github.com/apache/spark/pull/35696#issuecomment-1056221700


   @srowen Let me give some background if how clickhouse shard works.
   
   The concept `distributed table` in clickhouse is something like "remote view", which is a logical union of `local table`s from all cluster nodes. Generally, all of nods has the same `distributed table`.
   
   When SQL `select * from distribute_table` summit to one clickhouse node, it will collect recrods from all nodes and send back to JDBC client. Pass partition infomation to JDBC Driver, then the driver can leverage it to determine which node(shard) has the best data locality, it can significant reduce the network traffic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org