You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Abhijit Pol <ap...@rocketfuelinc.com> on 2009/09/08 19:06:54 UTC

enforcing query with partition column

Is there a way in hive to enforce use of partition column before
accepting the query for execution?
Often user make mistake and don't specify them and then end-up waiting
a lot more than expected....

Thanks,
Abhi

Re: enforcing query with partition column

Posted by Abhijit Pol <ap...@rocketfuelinc.com>.
>> What does explain <query> show?
Attached is output of explain extended for both queries (1) with MAPJOIN
hint (2) without hint...

>>An easy work around for now is to push the partition predicate into a
subquery on the table.
Not sure if I get this. I already have partition predicate pushed inside
subquery.
The actions table is dimension table and has no partitions. Beacons and
conversions table are fact tables partitioned on data_date.

On Fri, Sep 11, 2009 at 12:42 AM, Ashish Thusoo <at...@facebook.com>wrote:

> What does explain <query> show? An easy work around for now is to push the
> partition predicate into a subquery on the table.
>
> Ashish
> ________________________________________
> From: Abhijit Pol [apol@rocketfuelinc.com]
> Sent: Thursday, September 10, 2009 10:40 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: enforcing query with partition column
>
> Looks like MAPJOIN hint is causing a problem.
>
> Here is real the query which gives problem:
>
> INSERT OVERWRITE TABLE nikeusers1
> SELECT u.uid, u.type FROM
> (
>  SELECT /*+ MAPJOIN(a)*/ b.user_id AS uid, a.action_type AS type FROM
> actions a join beacons b on (b.conversion_action_version_id = a.version_id)
> WHERE a.campaign_id=443 and b.data_date >= '20090901' and (a.action_type=0
> or a.action_type=2)
> UNION ALL
> SELECT /*+ MAPJOIN(a)*/ c.user_id AS uid, a.action_type AS type FROM
> actions a join conversions c on (c.conversion_action_version_id =
> a.version_id) WHERE a.campaign_id=443 and c.data_date >= '20090901' and
> (a.action_type=0 or a.action_type=2)
> ) u
>
> Hive history file=/tmp/apol/hive_job_log_apol_200909110535_1230112822.txt
> FAILED: Error in semantic analysis: The query does not reference any valid
> partition. To run this query, set hive.mapred.mode=nonstrict
>
> Same query without MAPJOIN hint works fine....
>
>
>
> On Thu, Sep 10, 2009 at 10:30 PM, Abhijit Pol <apol@rocketfuelinc.com
> <ma...@rocketfuelinc.com>> wrote:
> I am using ~15 days old version from 0.4 branch......
>
>
>
>
> On Thu, Sep 10, 2009 at 5:55 PM, Namit Jain <njain@facebook.com<mailto:
> njain@facebook.com>> wrote:
>
> That should work – which revision of hive are you using ?
>
>
>
>
>
> From: Abhijit Pol [mailto:apol@rocketfuelinc.com<mailto:
> apol@rocketfuelinc.com>]
> Sent: Thursday, September 10, 2009 11:51 AM
>
> To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: enforcing query with partition column
>
>
>
> Thanks Namit. Setting this parameter in general works. However following
> query is now not allowed:
>
> SELECT a.bar FROM
> (
> SELECT b.bar FROM table1 b WHERE b.partition_key = 1
> UNION ALL
> SELECT c.bar FROM table2 c WHERE c.partition_key = 1
> ) a
>
> Individual sub queries are allowed here...any suggestions?
>
>
> On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain <njain@facebook.com<mailto:
> njain@facebook.com>> wrote:
>
> If you run in strict mode, that is enforced.
>
> set hive.mapred.mode=strict
>
>
>
> -namit
>
>
> -----Original Message-----
> From: Abhijit Pol [mailto:apol@rocketfuelinc.com<mailto:
> apol@rocketfuelinc.com>]
> Sent: Tuesday, September 08, 2009 10:07 AM
> To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: enforcing query with partition column
>
> Is there a way in hive to enforce use of partition column before
> accepting the query for execution?
> Often user make mistake and don't specify them and then end-up waiting
> a lot more than expected....
>
> Thanks,
> Abhi
>
>
>
>
>

RE: enforcing query with partition column

Posted by Ashish Thusoo <at...@facebook.com>.
What does explain <query> show? An easy work around for now is to push the partition predicate into a subquery on the table.

Ashish
________________________________________
From: Abhijit Pol [apol@rocketfuelinc.com]
Sent: Thursday, September 10, 2009 10:40 PM
To: hive-user@hadoop.apache.org
Subject: Re: enforcing query with partition column

Looks like MAPJOIN hint is causing a problem.

Here is real the query which gives problem:

INSERT OVERWRITE TABLE nikeusers1
SELECT u.uid, u.type FROM
(
 SELECT /*+ MAPJOIN(a)*/ b.user_id AS uid, a.action_type AS type FROM actions a join beacons b on (b.conversion_action_version_id = a.version_id) WHERE a.campaign_id=443 and b.data_date >= '20090901' and (a.action_type=0 or a.action_type=2)
UNION ALL
SELECT /*+ MAPJOIN(a)*/ c.user_id AS uid, a.action_type AS type FROM actions a join conversions c on (c.conversion_action_version_id = a.version_id) WHERE a.campaign_id=443 and c.data_date >= '20090901' and (a.action_type=0 or a.action_type=2)
) u

Hive history file=/tmp/apol/hive_job_log_apol_200909110535_1230112822.txt
FAILED: Error in semantic analysis: The query does not reference any valid partition. To run this query, set hive.mapred.mode=nonstrict

Same query without MAPJOIN hint works fine....



On Thu, Sep 10, 2009 at 10:30 PM, Abhijit Pol <ap...@rocketfuelinc.com>> wrote:
I am using ~15 days old version from 0.4 branch......




On Thu, Sep 10, 2009 at 5:55 PM, Namit Jain <nj...@facebook.com>> wrote:

That should work – which revision of hive are you using ?





From: Abhijit Pol [mailto:apol@rocketfuelinc.com<ma...@rocketfuelinc.com>]
Sent: Thursday, September 10, 2009 11:51 AM

To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: enforcing query with partition column



Thanks Namit. Setting this parameter in general works. However following query is now not allowed:

SELECT a.bar FROM
(
SELECT b.bar FROM table1 b WHERE b.partition_key = 1
UNION ALL
SELECT c.bar FROM table2 c WHERE c.partition_key = 1
) a

Individual sub queries are allowed here...any suggestions?


On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain <nj...@facebook.com>> wrote:

If you run in strict mode, that is enforced.

set hive.mapred.mode=strict



-namit


-----Original Message-----
From: Abhijit Pol [mailto:apol@rocketfuelinc.com<ma...@rocketfuelinc.com>]
Sent: Tuesday, September 08, 2009 10:07 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: enforcing query with partition column

Is there a way in hive to enforce use of partition column before
accepting the query for execution?
Often user make mistake and don't specify them and then end-up waiting
a lot more than expected....

Thanks,
Abhi





Re: enforcing query with partition column

Posted by Abhijit Pol <ap...@rocketfuelinc.com>.
Looks like MAPJOIN hint is causing a problem.

Here is real the query which gives problem:

INSERT OVERWRITE TABLE nikeusers1
SELECT u.uid, u.type FROM
(
 SELECT /*+ MAPJOIN(a)*/ b.user_id AS uid, a.action_type AS type FROM
actions a join beacons b on (b.conversion_action_version_id = a.version_id)
WHERE a.campaign_id=443 and b.data_date >= '20090901' and (a.action_type=0
or a.action_type=2)
UNION ALL
SELECT /*+ MAPJOIN(a)*/ c.user_id AS uid, a.action_type AS type FROM actions
a join conversions c on (c.conversion_action_version_id = a.version_id)
WHERE a.campaign_id=443 and c.data_date >= '20090901' and (a.action_type=0
or a.action_type=2)
) u

Hive history file=/tmp/apol/hive_job_log_apol_200909110535_1230112822.txt
FAILED: Error in semantic analysis: The query does not reference any valid
partition. To run this query, set hive.mapred.mode=nonstrict

Same query without MAPJOIN hint works fine....



On Thu, Sep 10, 2009 at 10:30 PM, Abhijit Pol <ap...@rocketfuelinc.com>wrote:

> I am using ~15 days old version from 0.4 branch......
>
>
>
>
> On Thu, Sep 10, 2009 at 5:55 PM, Namit Jain <nj...@facebook.com> wrote:
>
>>  That should work – which revision of hive are you using ?
>>
>>
>>
>>
>>
>> *From:* Abhijit Pol [mailto:apol@rocketfuelinc.com]
>> *Sent:* Thursday, September 10, 2009 11:51 AM
>> *To:* hive-user@hadoop.apache.org
>> *Subject:* Re: enforcing query with partition column
>>
>>
>>
>> Thanks Namit. Setting this parameter in general works. However following
>> query is now not allowed:
>>
>> SELECT a.bar FROM
>> (
>> SELECT b.bar FROM table1 b WHERE b.partition_key = 1
>> UNION ALL
>> SELECT c.bar FROM table2 c WHERE c.partition_key = 1
>> ) a
>>
>> Individual sub queries are allowed here...any suggestions?
>>
>>  On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain <nj...@facebook.com> wrote:
>>
>> If you run in strict mode, that is enforced.
>>
>> set hive.mapred.mode=strict
>>
>>
>>
>> -namit
>>
>>
>>
>> -----Original Message-----
>> From: Abhijit Pol [mailto:apol@rocketfuelinc.com]
>> Sent: Tuesday, September 08, 2009 10:07 AM
>> To: hive-user@hadoop.apache.org
>> Subject: enforcing query with partition column
>>
>> Is there a way in hive to enforce use of partition column before
>> accepting the query for execution?
>> Often user make mistake and don't specify them and then end-up waiting
>> a lot more than expected....
>>
>> Thanks,
>> Abhi
>>
>>
>>
>
>

Re: enforcing query with partition column

Posted by Abhijit Pol <ap...@rocketfuelinc.com>.
I am using ~15 days old version from 0.4 branch......



On Thu, Sep 10, 2009 at 5:55 PM, Namit Jain <nj...@facebook.com> wrote:

>  That should work – which revision of hive are you using ?
>
>
>
>
>
> *From:* Abhijit Pol [mailto:apol@rocketfuelinc.com]
> *Sent:* Thursday, September 10, 2009 11:51 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: enforcing query with partition column
>
>
>
> Thanks Namit. Setting this parameter in general works. However following
> query is now not allowed:
>
> SELECT a.bar FROM
> (
> SELECT b.bar FROM table1 b WHERE b.partition_key = 1
> UNION ALL
> SELECT c.bar FROM table2 c WHERE c.partition_key = 1
> ) a
>
> Individual sub queries are allowed here...any suggestions?
>
>  On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain <nj...@facebook.com> wrote:
>
> If you run in strict mode, that is enforced.
>
> set hive.mapred.mode=strict
>
>
>
> -namit
>
>
>
> -----Original Message-----
> From: Abhijit Pol [mailto:apol@rocketfuelinc.com]
> Sent: Tuesday, September 08, 2009 10:07 AM
> To: hive-user@hadoop.apache.org
> Subject: enforcing query with partition column
>
> Is there a way in hive to enforce use of partition column before
> accepting the query for execution?
> Often user make mistake and don't specify them and then end-up waiting
> a lot more than expected....
>
> Thanks,
> Abhi
>
>
>

RE: enforcing query with partition column

Posted by Namit Jain <nj...@facebook.com>.
That should work - which revision of hive are you using ?


From: Abhijit Pol [mailto:apol@rocketfuelinc.com]
Sent: Thursday, September 10, 2009 11:51 AM
To: hive-user@hadoop.apache.org
Subject: Re: enforcing query with partition column

Thanks Namit. Setting this parameter in general works. However following query is now not allowed:

SELECT a.bar FROM
(
SELECT b.bar FROM table1 b WHERE b.partition_key = 1
UNION ALL
SELECT c.bar FROM table2 c WHERE c.partition_key = 1
) a

Individual sub queries are allowed here...any suggestions?

On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain <nj...@facebook.com>> wrote:
If you run in strict mode, that is enforced.

set hive.mapred.mode=strict



-namit


-----Original Message-----
From: Abhijit Pol [mailto:apol@rocketfuelinc.com<ma...@rocketfuelinc.com>]
Sent: Tuesday, September 08, 2009 10:07 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: enforcing query with partition column

Is there a way in hive to enforce use of partition column before
accepting the query for execution?
Often user make mistake and don't specify them and then end-up waiting
a lot more than expected....

Thanks,
Abhi


Re: enforcing query with partition column

Posted by Abhijit Pol <ap...@rocketfuelinc.com>.
Thanks Namit. Setting this parameter in general works. However following
query is now not allowed:

SELECT a.bar FROM
(
SELECT b.bar FROM table1 b WHERE b.partition_key = 1
UNION ALL
SELECT c.bar FROM table2 c WHERE c.partition_key = 1
) a

Individual sub queries are allowed here...any suggestions?


On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain <nj...@facebook.com> wrote:

> If you run in strict mode, that is enforced.
>
> set hive.mapred.mode=strict
>
>
>
> -namit
>
>
> -----Original Message-----
> From: Abhijit Pol [mailto:apol@rocketfuelinc.com]
> Sent: Tuesday, September 08, 2009 10:07 AM
> To: hive-user@hadoop.apache.org
> Subject: enforcing query with partition column
>
> Is there a way in hive to enforce use of partition column before
> accepting the query for execution?
> Often user make mistake and don't specify them and then end-up waiting
> a lot more than expected....
>
> Thanks,
> Abhi
>

RE: enforcing query with partition column

Posted by Namit Jain <nj...@facebook.com>.
If you run in strict mode, that is enforced.

set hive.mapred.mode=strict



-namit


-----Original Message-----
From: Abhijit Pol [mailto:apol@rocketfuelinc.com] 
Sent: Tuesday, September 08, 2009 10:07 AM
To: hive-user@hadoop.apache.org
Subject: enforcing query with partition column

Is there a way in hive to enforce use of partition column before
accepting the query for execution?
Often user make mistake and don't specify them and then end-up waiting
a lot more than expected....

Thanks,
Abhi