You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Viraj Bhat <vi...@yahoo-inc.com> on 2010/06/17 01:07:52 UTC

Creating partitions causes Error in semantic analysis

Hi all,

  I have a table known as "oldtable" which is partitioned by datestamp.

 

The schema of the "oldtable" is:

 

name string

age bigint

property string

region string

datestamp string

 

 

I now need to create a new table which is based of this old table and
partitioned by (datestamp, region, property)

 

The DDL for the new table looks like:

 

CREATE EXTERNAL TABLE newtable

(

newname string,

newage bigint,

)

 

PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)

 

STORED AS RCFILE

LOCATION '/user/viraj/rcfile;

 

 

When I try to populate this new table from my old table, I try to use
partitioning which uses values of old columns.

 

INSERT OVERWRITE TABLE newtable

PARTITION (newdatestamp='20100525', region, property)

SELECT

  name,

  age

from oldtable where datestamp='20100525';

 

The above statement causes an error and expects hardcoded values for
region and property.

 

FAILED: Error in semantic analysis: Partition column in the partition
specification does not exist.

 

How do I specify the partition information such that the new tables,
takes values from "property" and "region" from the old table and uses it
as partitions.

 

Is there a better way to achieve the above instead of hard coding values
for each and every partition?

 

=======================================================

Addendum: If the above is possible, how can I define some conditions
where I need to say, If region is not "us" or "asia", put it in another
partition known as misc?

=======================================================

 

 

Thanks Viraj


Re: Creating partitions causes Error in semantic analysis

Posted by Ning Zhang <nz...@facebook.com>.
The __HIVE_DEFAULT_PARTITION__ is created by default if the partitioning column value (newdatestamp etc.) is NULL or empty string. Below is the wiki page that describes the syntax and semantics of dynamic partitioning, including some best practices. 

http://wiki.apache.org/hadoop/Hive/Tutorial#Dynamic-partition_Insert

Ning

On Jun 16, 2010, at 11:11 PM, Viraj Bhat wrote:

> Hi Yongqiang,
> I am using the trunk code. I figured out what the problem was 
> INSERT OVERWRITE TABLE newtable
> PARTITION (newdatestamp, myregion, myproperty)
> SELECT
> name,
> age,
> datestamp as newdatestamp,
> region as myregion,
> property as myproperty,
> from oldtable where datestamp='20100525';
> 
> I need to specify the last 3 columns in the order of partitions, which I did not. 
> 
> Meanwhile the dynamic partitioning produced a partition which was named "__HIVE_DEFAULT_PARTITION__". Is this created by default? 
> 
> Thanks again for your help.
> Viraj
> 
> -----Original Message-----
> From: yongqiang he [mailto:heyongqiangict@gmail.com] 
> Sent: Wednesday, June 16, 2010 5:46 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Creating partitions causes Error in semantic analysis
> 
> Hive supports dynamic partition ( i think you need to use trunk code
> for this feature.?).
> 
> here is an example:
> 
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> 
> create table if not exists nzhang_part1 like srcpart;
> create table if not exists nzhang_part2 like srcpart;
> describe extended nzhang_part1;
> 
> from srcpart
> insert overwrite table nzhang_part1 partition (ds, hr) select key,
> value, ds, hr where ds <= '2008-04-08'
> insert overwrite table nzhang_part2 partition(ds='2008-12-31', hr)
> select key, value, hr where ds > '2008-04-08';
> 
> On Wed, Jun 16, 2010 at 4:07 PM, Viraj Bhat <vi...@yahoo-inc.com> wrote:
>> Hi all,
>> 
>>   I have a table known as "oldtable" which is partitioned by datestamp.
>> 
>> 
>> 
>> The schema of the "oldtable" is:
>> 
>> 
>> 
>> name string
>> 
>> age bigint
>> 
>> property string
>> 
>> region string
>> 
>> datestamp string
>> 
>> 
>> 
>> 
>> 
>> I now need to create a new table which is based of this old table and
>> partitioned by (datestamp, region, property)
>> 
>> 
>> 
>> The DDL for the new table looks like:
>> 
>> 
>> 
>> CREATE EXTERNAL TABLE newtable
>> 
>> (
>> 
>> newname string,
>> 
>> newage bigint,
>> 
>> )
>> 
>> 
>> 
>> PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)
>> 
>> 
>> 
>> STORED AS RCFILE
>> 
>> LOCATION '/user/viraj/rcfile;
>> 
>> 
>> 
>> 
>> 
>> When I try to populate this new table from my old table, I try to use
>> partitioning which uses values of old columns.
>> 
>> 
>> 
>> INSERT OVERWRITE TABLE newtable
>> 
>> PARTITION (newdatestamp='20100525', region, property)
>> 
>> SELECT
>> 
>>   name,
>> 
>>   age
>> 
>> from oldtable where datestamp='20100525';
>> 
>> 
>> 
>> The above statement causes an error and expects hardcoded values for region
>> and property.
>> 
>> 
>> 
>> FAILED: Error in semantic analysis: Partition column in the partition
>> specification does not exist.
>> 
>> 
>> 
>> How do I specify the partition information such that the new tables, takes
>> values from "property" and "region" from the old table and uses it as
>> partitions.
>> 
>> 
>> 
>> Is there a better way to achieve the above instead of hard coding values for
>> each and every partition?
>> 
>> 
>> 
>> =======================================================
>> 
>> Addendum: If the above is possible, how can I define some conditions where I
>> need to say, If region is not "us" or "asia", put it in another partition
>> known as misc?
>> 
>> =======================================================
>> 
>> 
>> 
>> 
>> 
>> Thanks Viraj


RE: Creating partitions causes Error in semantic analysis

Posted by Viraj Bhat <vi...@yahoo-inc.com>.
Hi Yongqiang,
 I am using the trunk code. I figured out what the problem was 
INSERT OVERWRITE TABLE newtable
PARTITION (newdatestamp, myregion, myproperty)
SELECT
name,
age,
datestamp as newdatestamp,
region as myregion,
property as myproperty,
from oldtable where datestamp='20100525';

I need to specify the last 3 columns in the order of partitions, which I did not. 

Meanwhile the dynamic partitioning produced a partition which was named "__HIVE_DEFAULT_PARTITION__". Is this created by default? 

Thanks again for your help.
Viraj

-----Original Message-----
From: yongqiang he [mailto:heyongqiangict@gmail.com] 
Sent: Wednesday, June 16, 2010 5:46 PM
To: hive-user@hadoop.apache.org
Subject: Re: Creating partitions causes Error in semantic analysis

Hive supports dynamic partition ( i think you need to use trunk code
for this feature.?).

here is an example:

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

create table if not exists nzhang_part1 like srcpart;
create table if not exists nzhang_part2 like srcpart;
describe extended nzhang_part1;

from srcpart
insert overwrite table nzhang_part1 partition (ds, hr) select key,
value, ds, hr where ds <= '2008-04-08'
insert overwrite table nzhang_part2 partition(ds='2008-12-31', hr)
select key, value, hr where ds > '2008-04-08';

On Wed, Jun 16, 2010 at 4:07 PM, Viraj Bhat <vi...@yahoo-inc.com> wrote:
> Hi all,
>
>   I have a table known as "oldtable" which is partitioned by datestamp.
>
>
>
> The schema of the "oldtable" is:
>
>
>
> name string
>
> age bigint
>
> property string
>
> region string
>
> datestamp string
>
>
>
>
>
> I now need to create a new table which is based of this old table and
> partitioned by (datestamp, region, property)
>
>
>
> The DDL for the new table looks like:
>
>
>
> CREATE EXTERNAL TABLE newtable
>
> (
>
> newname string,
>
> newage bigint,
>
> )
>
>
>
> PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)
>
>
>
> STORED AS RCFILE
>
> LOCATION '/user/viraj/rcfile;
>
>
>
>
>
> When I try to populate this new table from my old table, I try to use
> partitioning which uses values of old columns.
>
>
>
> INSERT OVERWRITE TABLE newtable
>
> PARTITION (newdatestamp='20100525', region, property)
>
> SELECT
>
>   name,
>
>   age
>
> from oldtable where datestamp='20100525';
>
>
>
> The above statement causes an error and expects hardcoded values for region
> and property.
>
>
>
> FAILED: Error in semantic analysis: Partition column in the partition
> specification does not exist.
>
>
>
> How do I specify the partition information such that the new tables, takes
> values from "property" and "region" from the old table and uses it as
> partitions.
>
>
>
> Is there a better way to achieve the above instead of hard coding values for
> each and every partition?
>
>
>
> =======================================================
>
> Addendum: If the above is possible, how can I define some conditions where I
> need to say, If region is not "us" or "asia", put it in another partition
> known as misc?
>
> =======================================================
>
>
>
>
>
> Thanks Viraj

Re: Creating partitions causes Error in semantic analysis

Posted by yongqiang he <he...@gmail.com>.
Hive supports dynamic partition ( i think you need to use trunk code
for this feature.?).

here is an example:

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

create table if not exists nzhang_part1 like srcpart;
create table if not exists nzhang_part2 like srcpart;
describe extended nzhang_part1;

from srcpart
insert overwrite table nzhang_part1 partition (ds, hr) select key,
value, ds, hr where ds <= '2008-04-08'
insert overwrite table nzhang_part2 partition(ds='2008-12-31', hr)
select key, value, hr where ds > '2008-04-08';

On Wed, Jun 16, 2010 at 4:07 PM, Viraj Bhat <vi...@yahoo-inc.com> wrote:
> Hi all,
>
>   I have a table known as “oldtable” which is partitioned by datestamp.
>
>
>
> The schema of the “oldtable” is:
>
>
>
> name string
>
> age bigint
>
> property string
>
> region string
>
> datestamp string
>
>
>
>
>
> I now need to create a new table which is based of this old table and
> partitioned by (datestamp, region, property)
>
>
>
> The DDL for the new table looks like:
>
>
>
> CREATE EXTERNAL TABLE newtable
>
> (
>
> newname string,
>
> newage bigint,
>
> )
>
>
>
> PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)
>
>
>
> STORED AS RCFILE
>
> LOCATION '/user/viraj/rcfile;
>
>
>
>
>
> When I try to populate this new table from my old table, I try to use
> partitioning which uses values of old columns.
>
>
>
> INSERT OVERWRITE TABLE newtable
>
> PARTITION (newdatestamp='20100525', region, property)
>
> SELECT
>
>   name,
>
>   age
>
> from oldtable where datestamp='20100525';
>
>
>
> The above statement causes an error and expects hardcoded values for region
> and property.
>
>
>
> FAILED: Error in semantic analysis: Partition column in the partition
> specification does not exist.
>
>
>
> How do I specify the partition information such that the new tables, takes
> values from “property” and “region” from the old table and uses it as
> partitions.
>
>
>
> Is there a better way to achieve the above instead of hard coding values for
> each and every partition?
>
>
>
> =======================================================
>
> Addendum: If the above is possible, how can I define some conditions where I
> need to say, If region is not “us” or “asia”, put it in another partition
> known as misc?
>
> =======================================================
>
>
>
>
>
> Thanks Viraj

RE: Creating partitions causes Error in semantic analysis

Posted by Viraj Bhat <vi...@yahoo-inc.com>.
Hi John,
 Yes the problem was that in the select clause I needed to define the
partition columns in exactly the similar order.
INSERT OVERWRITE TABLE newtable
PARTITION (newdatestamp, myregion, myproperty)
SELECT
  name,
  age,
  datestamp as newdatestamp,
  region as myregion,
  property as myproperty
from oldtable where datestamp='20100525';

I will put in the CASE statement to obtain the necessary partition
information.

Is __HIVE_DEFAULT_PARTITION__ created by default?

Thanks again for your help.
Viraj

-----Original Message-----
From: John Sichi [mailto:jsichi@facebook.com] 
Sent: Wednesday, June 16, 2010 8:03 PM
To: hive-user@hadoop.apache.org
Subject: RE: Creating partitions causes Error in semantic analysis

I think you should be using myregion and myproperty in the PARTITION
clause (not region and property).

Also, regarding your question in the addendum, you could use a CASE
expression in the source SELECT statement to transform the region.

JVS

________________________________________
From: Viraj Bhat [viraj@yahoo-inc.com]
Sent: Wednesday, June 16, 2010 4:07 PM
To: hive-user@hadoop.apache.org
Subject: Creating partitions causes Error in semantic analysis

Hi all,
  I have a table known as "oldtable" which is partitioned by datestamp.

The schema of the "oldtable" is:

name string
age bigint
property string
region string
datestamp string


I now need to create a new table which is based of this old table and
partitioned by (datestamp, region, property)

The DDL for the new table looks like:

CREATE EXTERNAL TABLE newtable
(
newname string,
newage bigint,
)

PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)

STORED AS RCFILE
LOCATION '/user/viraj/rcfile;


When I try to populate this new table from my old table, I try to use
partitioning which uses values of old columns.

INSERT OVERWRITE TABLE newtable
PARTITION (newdatestamp='20100525', region, property)
SELECT
  name,
  age
from oldtable where datestamp='20100525';

The above statement causes an error and expects hardcoded values for
region and property.

FAILED: Error in semantic analysis: Partition column in the partition
specification does not exist.

How do I specify the partition information such that the new tables,
takes values from "property" and "region" from the old table and uses it
as partitions.

Is there a better way to achieve the above instead of hard coding values
for each and every partition?

=======================================================
Addendum: If the above is possible, how can I define some conditions
where I need to say, If region is not "us" or "asia", put it in another
partition known as misc?
=======================================================


Thanks Viraj

RE: Creating partitions causes Error in semantic analysis

Posted by John Sichi <js...@facebook.com>.
I think you should be using myregion and myproperty in the PARTITION clause (not region and property).

Also, regarding your question in the addendum, you could use a CASE expression in the source SELECT statement to transform the region.

JVS

________________________________________
From: Viraj Bhat [viraj@yahoo-inc.com]
Sent: Wednesday, June 16, 2010 4:07 PM
To: hive-user@hadoop.apache.org
Subject: Creating partitions causes Error in semantic analysis

Hi all,
  I have a table known as “oldtable” which is partitioned by datestamp.

The schema of the “oldtable” is:

name string
age bigint
property string
region string
datestamp string


I now need to create a new table which is based of this old table and partitioned by (datestamp, region, property)

The DDL for the new table looks like:

CREATE EXTERNAL TABLE newtable
(
newname string,
newage bigint,
)

PARTITIONED BY (newdatestamp STRING, myregion STRING, myproperty STRING)

STORED AS RCFILE
LOCATION '/user/viraj/rcfile;


When I try to populate this new table from my old table, I try to use partitioning which uses values of old columns.

INSERT OVERWRITE TABLE newtable
PARTITION (newdatestamp='20100525', region, property)
SELECT
  name,
  age
from oldtable where datestamp='20100525';

The above statement causes an error and expects hardcoded values for region and property.

FAILED: Error in semantic analysis: Partition column in the partition specification does not exist.

How do I specify the partition information such that the new tables, takes values from “property” and “region” from the old table and uses it as partitions.

Is there a better way to achieve the above instead of hard coding values for each and every partition?

=======================================================
Addendum: If the above is possible, how can I define some conditions where I need to say, If region is not “us” or “asia”, put it in another partition known as misc?
=======================================================


Thanks Viraj