You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 崔苗 <cu...@danale.com> on 2018/04/18 06:08:56 UTC

"not in" sql spend a lot of time

Hi,
when I  execute sql like that:
"select * from onlineDevice where deviceId not in (select deviceId from historyDevice)")
I found the task spend a lot of time(over 40 min),I stopped the task but I can't found the reason from spark history UI.
the historyDevice and onlineDevice only contain about 3 millions of records

spark-submit :
  --master yarn --deploy-mode client --driver-memory 8G --num-executors 2 --executor-memory 9G --executor-cores 6