You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sid <fl...@gmail.com> on 2022/08/11 18:20:41 UTC

Joins internally

Hi Team,

Assume we have a large dataset and sort merge is by default join that spark
applies on this dataset.

Now, i want to understand internal working of joins.

How does this join work or any join work ?

Assume that data is already shuffled and sorted on the basis of keys.

So lets say that Table A has two Partitions A & B where data is hashed
based on hash value  and sorted within partitions

So my question is how does it comes to know that which partition from Table
A has to be joined or searched with which partition from Table B ?

TIA,
Sid