You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sukhendu Chakraborty <su...@gmail.com> on 2013/12/04 18:05:59 UTC

mapjoin on subquery result

Hi,

Is there anyway mapjoin works on the subquery(not the underlying table). I
have the following query:

select external_id,count(category_id) from
catalog_products_in_categories_orc pc inner join (select * from
catalog_products_orc where s_id=118) p on pc.product_id=p.id   group by
external_id;


Now, even though catalog_products_orc is a big table, after filtering
(s_id=118) it results in very few number of rows which can be easily
optimized to a mapjoin (with catalog_products_in_categories_orc as the big
table and the subquery result as the small table) . However, when I try to
specify /*+MAPJOIN(p)*/ to enforce this, it results in a mapjoin for the
table catalog_products_orc (and not on the subquery after filtering).

Any ideas to achieve mapjoin on a subquery (and not the underlying table)?


-Sukhendu

Re: mapjoin on subquery result

Posted by bharath vissapragada <bh...@gmail.com>.
Hi,

Did you set hive.auto.convert.join=true and try? It does a runtime check of
input table sizes and autoconverts join operators to mapjoin if possible.
I'm not sure if it does it for subqueries too, need to look into the code
base. If any of the devs can confirm, that would be great.

- Bharath


On Wed, Dec 4, 2013 at 10:35 PM, Sukhendu Chakraborty <
sukhendu.chakraborty@gmail.com> wrote:

> Hi,
>
> Is there anyway mapjoin works on the subquery(not the underlying table). I
> have the following query:
>
> select external_id,count(category_id) from
> catalog_products_in_categories_orc pc inner join (select * from
> catalog_products_orc where s_id=118) p on pc.product_id=p.id   group by
> external_id;
>
>
> Now, even though catalog_products_orc is a big table, after filtering
> (s_id=118) it results in very few number of rows which can be easily
> optimized to a mapjoin (with catalog_products_in_categories_orc as the big
> table and the subquery result as the small table) . However, when I try to
> specify /*+MAPJOIN(p)*/ to enforce this, it results in a mapjoin for the
> table catalog_products_orc (and not on the subquery after filtering).
>
> Any ideas to achieve mapjoin on a subquery (and not the underlying table)?
>
>
> -Sukhendu
>