You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sid <fl...@gmail.com> on 2022/08/11 18:20:41 UTC
Joins internally
Hi Team,
Assume we have a large dataset and sort merge is by default join that spark
applies on this dataset.
Now, i want to understand internal working of joins.
How does this join work or any join work ?
Assume that data is already shuffled and sorted on the basis of keys.
So lets say that Table A has two Partitions A & B where data is hashed
based on hash value and sorted within partitions
So my question is how does it comes to know that which partition from Table
A has to be joined or searched with which partition from Table B ?
TIA,
Sid