You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Cristi Cioriia <cr...@1and1.ro> on 2010/08/24 17:48:24 UTC

Multi Table Inserts produces multiple jobs

Hi guys,

I would like to use the Multi Insert feature of HIVE so that I could
have fewer map-reduce jobs than running separate queries.

I have some HIVE queries that use the Multi Insert feature as below:

FROM source_table
INSERT OVERWRITE TABLE tablename1 
SELECT field1, field2 ...fieldN 
GROUP BY field1, field2 
INSERT OVERWRITE TABLE tablename2
SELECT field1,  field3 ... fieldK
GROUP BY field1, field3

I was hoping that by using this feature only 1 Map-Reduce job will be
created, but what I found out when running the query is that 2  jobs are
created, just as if I would have ran 2 separate queries:

FROM source_table
INSERT OVERWRITE TABLE tablename1 
SELECT field1, field2 ...fieldN 
GROUP BY field1, field2

FROM source_table
INSERT OVERWRITE TABLE tablename1 
SELECT field1,  field3 ... fieldK
GROUP BY field1, field3

Is there any way that I can get only 1 MR job with the multi insert
syntax?

Thanks,
Cristi

Re: Multi Table Inserts produces multiple jobs

Posted by Thiruvel Thirumoolan <th...@yahoo-inc.com>.

Hi Cristi,

The source_table is scanned only once in a multi-insert scenario, whereas if u have 2 queries it will be scanned twice.

If you do an 'explain extended' on the query you would know the flow of data.

You could find related info @ http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook - Slides 51-53.

-Thiruvel

On Aug 24, 2010, at 9:18 PM, Cristi Cioriia wrote:

> Hi guys,
> 
> I would like to use the Multi Insert feature of HIVE so that I could
> have fewer map-reduce jobs than running separate queries.
> 
> I have some HIVE queries that use the Multi Insert feature as below:
> 
> FROM source_table
> INSERT OVERWRITE TABLE tablename1 
> SELECT field1, field2 ...fieldN 
> GROUP BY field1, field2 
> INSERT OVERWRITE TABLE tablename2
> SELECT field1,  field3 ... fieldK
> GROUP BY field1, field3
> 
> I was hoping that by using this feature only 1 Map-Reduce job will be
> created, but what I found out when running the query is that 2  jobs are
> created, just as if I would have ran 2 separate queries:
> 
> FROM source_table
> INSERT OVERWRITE TABLE tablename1 
> SELECT field1, field2 ...fieldN 
> GROUP BY field1, field2
> 
> FROM source_table
> INSERT OVERWRITE TABLE tablename1 
> SELECT field1,  field3 ... fieldK
> GROUP BY field1, field3
> 
> Is there any way that I can get only 1 MR job with the multi insert
> syntax?
> 
> Thanks,
> Cristi
> 
> 
> 
> 
>

Re: Multi Table Inserts produces multiple jobs

Posted by Ning Zhang <nz...@facebook.com>.

Multi table insert sometimes can combine queries in one MR job but since you have a group by for each subquery you will need 2 MR jobs.

One thing you may optimize is to set hive.exec.parallel to true so that 2 MR jobs will be running in parallel.

Thanks,
Ning
------
Sent from my blackberry


----- Original Message -----
From: Cristi Cioriia [mailto:cristian-andrei.cioriia@1and1.ro]
Sent: Tuesday, August 24, 2010 08:48 AM
To: hive-user@hadoop.apache.org <hi...@hadoop.apache.org>
Subject: Multi Table Inserts produces multiple jobs

Hi guys,

I would like to use the Multi Insert feature of HIVE so that I could
have fewer map-reduce jobs than running separate queries.

I have some HIVE queries that use the Multi Insert feature as below:

FROM source_table
INSERT OVERWRITE TABLE tablename1 
SELECT field1, field2 ...fieldN 
GROUP BY field1, field2 
INSERT OVERWRITE TABLE tablename2
SELECT field1,  field3 ... fieldK
GROUP BY field1, field3

I was hoping that by using this feature only 1 Map-Reduce job will be
created, but what I found out when running the query is that 2  jobs are
created, just as if I would have ran 2 separate queries:

FROM source_table
INSERT OVERWRITE TABLE tablename1 
SELECT field1, field2 ...fieldN 
GROUP BY field1, field2

FROM source_table
INSERT OVERWRITE TABLE tablename1 
SELECT field1,  field3 ... fieldK
GROUP BY field1, field3

Is there any way that I can get only 1 MR job with the multi insert
syntax?

Thanks,
Cristi