You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Justin (JIRA)" <ji...@apache.org> on 2013/10/12 06:40:42 UTC

[jira] [Commented] (HIVE-4693) If you set hive.optimize.skewjoin=true, and number of identical keys is < hive.skewjoin.key don't fail with FileNotFoundException

    [ https://issues.apache.org/jira/browse/HIVE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793254#comment-13793254 ] 

Justin commented on HIVE-4693:
------------------------------

We have several skewed dimensions and not being able to utilize this causes severe performance degradation. I've tested by removing the skewed data.

Is there a workaround? I've tried setting hive.skewjoin.key to no avail.


> If you set hive.optimize.skewjoin=true, and number of identical keys is < hive.skewjoin.key don't fail with FileNotFoundException
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4693
>                 URL: https://issues.apache.org/jira/browse/HIVE-4693
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Robert Justice
>
> We would like to set hive.optimize.skewjoin to true to use it when skew is encountered, but if the number of identical keys is not met, it will crash due to not finding hive_skew_join_bigkeys_0.  Could we just bail out and go back to a standard join rather than failing?
> Ended Job = job_201306061640_0003
> java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive-rjustice/hive_2013-06-07_10-02-03_755_605133549375679913/-mr-10003/hive_skew_join_bigkeys_0 does not exist.
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:410)
> 	at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
> 	at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
> 	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> 	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
> 	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Ended Job = 672729995, job is filtered out (removed at runtime).
> MapReduce Jobs Launched: 



--
This message was sent by Atlassian JIRA
(v6.1#6144)