You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by vector <79...@qq.com> on 2015/12/08 14:58:51 UTC
Filte the null before InnerJoin to solve the problem of data skew
when i join two tables, i find a table has the problem of data skew, and the skewing value of the field is null. so i want to filte the null before InnerJoin. like that
a.key is skewed and the skewing value is null
Change
"select * from a join b on a.key = b.key"
to
"select * from a join b on a.key = b.key and a.key is not null"
The idea is feasible ?