You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by vector <79...@qq.com> on 2015/12/08 14:58:51 UTC

Filte the null before InnerJoin to solve the problem of data skew

when i join two tables, i find a table has the problem of data skew, and the skewing value of the field is null. so i want to filte  the null before InnerJoin. like that


a.key is skewed and the skewing value is null


Change


"select * from a join b on a.key = b.key"


to


"select * from a join b on a.key = b.key and a.key is not null"


The idea is feasible ?