You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Avrilia Floratou <av...@gmail.com> on 2013/11/22 19:31:10 UTC

TPC-H queries on Hive 0.12

Hello,

I'd like to run a few TPC-H queries on Hive 0.12. I've found the TPC-H
scripts here:

https://issues.apache.org/jira/browse/HIVE-600.

but noticed that these scripts were generated a long time ago. Since Hive
could not support full SQL-92 specification some queries were split into
smaller sub-queries whose results have been materialized. Is there any
change in HiveQL (in Hive 0.12) that would affect the way the TPC-H queries
are written?

Thanks,
Avrilia

Re: TPC-H queries on Hive 0.12

Posted by Yin Huai <hu...@gmail.com>.
I remember that textfiles are used in those scripts. With 0.12, I think ORC
should be used. Also, I think those sub-queries should be merged into a
single query. With a single query, if a reduce join is converted to a map
join, this map join can be merged to its child job. But, if this join is
evaluated by an individual query, hive has to use a single map only job to
evaluate it because it does not know this map only job is used to generate
intermediate results. For query 17 and query 18, with a single query,
Correlation Optimizer should be able to optimize these two queries (set
hive.optimize.correlation=true).

Thanks,

Yin


On Fri, Nov 22, 2013 at 1:31 PM, Avrilia Floratou <
avrilia.floratou@gmail.com> wrote:

> Hello,
>
> I'd like to run a few TPC-H queries on Hive 0.12. I've found the TPC-H
> scripts here:
>
> https://issues.apache.org/jira/browse/HIVE-600.
>
> but noticed that these scripts were generated a long time ago. Since Hive
> could not support full SQL-92 specification some queries were split into
> smaller sub-queries whose results have been materialized. Is there any
> change in HiveQL (in Hive 0.12) that would affect the way the TPC-H queries
> are written?
>
> Thanks,
> Avrilia
>