You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Adrian Popescu <ad...@epfl.ch> on 2013/11/19 14:25:18 UTC

skewed join problem

Hello All,

I encounter a bug when executing TPCH queries with skewed join optimization enabled:
In particular, if the skewed join optimization is enabled but not triggered (i.e., the number of
rows with the same key is less than "hive.skewjoin.key") all the following jobs of the
query are filtered out mistakenly at runtime (for instance only stage 6
and 22 are executed from the plan attached). The corresponding query
using only common joins executes correctly. Similar behaviour is observed
for multiple TPCH queries.

If anyone can comment on this issue or give me any pointers on what could go wrong
I would really appreciate it. I can also provide the queries and guidance in
reproducing the error if anyone from the development team is interested.

Thanks a lot!
Adrian