You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ronak Bhatt <ro...@gmail.com> on 2010/08/21 03:41:31 UTC

question..partitioned table...

for the following create table, I'm getting the error : FAILED: Error in
semantic analysis: Column repeated in partitioning columns...

any suggestion?

thanks, ronak


create table callbacks_part
(
ID INT,
BOOKMARK_STEP_ID INT,
CLIENT_ID INT,
TXN_ID STRING,
SESSION_ID STRING,
IP_ADDRESS STRING,
METHOD STRING,
PAGE_ID INT,
RULE_NAME_ID INT,
CALLBACK_TYPE_ID INT,
PAGE_ELEMENT_ID INT,
CALLBACK_SENSE_ID INT,
CALLBACK_INFO STRING,
NUM_OCCURANCES INT,
UNIQUE_ID STRING,
EXEC_DATE STRING,
EXEC_DURATION FLOAT
)
PARTITIONED BY(EXEC_DATE STRING)
CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;





*thanks, ronak*
*
*
*408 504 4847*
*My Blog : http://ronakbaps.posterous.com*
*
*
*
*

Re: question..partitioned table...

Posted by Ronak Bhatt <ro...@gmail.com>.
Thanks Dilip...this is useful...it is something different for a normal RDBMS
experience...

Is there any way to update the documentation with such explanation and may
be an example?


*thanks, ronak*
*
*



On Sat, Aug 21, 2010 at 12:45 AM, Dilip Joseph <
dilip.antony.joseph@gmail.com> wrote:

> The partition column is automatically included as part of the table
> definition, and hence can be used like any other column.
>
> So you can issue queries like SELECT c1, c2, c3 from callback_parts WHERE
> c2='foo'; for table defined as follows:
>
> create table callbacks_part
> (
>  c1 INT,
>  c3 STRING
> )
> PARTITIONED BY(c2 STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;
>
> Dilip
>
>
> On Fri, Aug 20, 2010 at 7:58 PM, Ronak Bhatt <ro...@gmail.com> wrote:
>
>> I'm from oracle/sql background...the thinking there (in oracle) is -
>>
>> 1. one has a table with columns (c1, c2, c3)
>> 2. define the partition on one of the columns (or combination of more than
>> one) say c2
>> 3. When data is inserted, it will be stored in right partition based on
>> the value of c2.
>> 4. when table is queries, if the WHERE predicate has partition column (c2
>> in this case), optimizer will go to that partition without user taking any
>> specific action (e.g. without user specifying the partition)
>>
>> whereas,
>>
>> In the scenario below that I mentioned, I've a text file that has one of
>> the fields as exec_date ....after some more research, I also learned that in
>> HIVE partition cannot be on existing column of defined table e.g. I've to
>> have a column name that is not part of table..
>>
>> What I don't understand is that how do I write the queries so that I can
>> utilize the partition if the partition column is not part of the table
>> definition..
>>
>> Any inputs would be greatly appreciated
>>
>>
>>
>> *thanks, ronak*
>> *
>> *
>> *408 504 4847*
>> *My Blog : http://ronakbaps.posterous.com*
>> *
>> *
>> *
>> *
>>
>>
>>
>> On Fri, Aug 20, 2010 at 7:34 PM, Dilip Joseph <
>> dilip.antony.joseph@gmail.com> wrote:
>>
>>> You don't need to specify "EXEC_DATE STRING," in your column definition
>>> as it is already specified by "PARTITIONED BY(EXEC_DATE STRING)
>>>
>>> Dilip
>>>
>>>
>>> On Fri, Aug 20, 2010 at 6:41 PM, Ronak Bhatt <ro...@gmail.com>wrote:
>>>
>>>> for the following create table, I'm getting the error : FAILED: Error in
>>>> semantic analysis: Column repeated in partitioning columns...
>>>>
>>>> any suggestion?
>>>>
>>>> thanks, ronak
>>>>
>>>>
>>>> create table callbacks_part
>>>> (
>>>> ID INT,
>>>> BOOKMARK_STEP_ID INT,
>>>> CLIENT_ID INT,
>>>> TXN_ID STRING,
>>>> SESSION_ID STRING,
>>>> IP_ADDRESS STRING,
>>>> METHOD STRING,
>>>> PAGE_ID INT,
>>>> RULE_NAME_ID INT,
>>>> CALLBACK_TYPE_ID INT,
>>>> PAGE_ELEMENT_ID INT,
>>>> CALLBACK_SENSE_ID INT,
>>>> CALLBACK_INFO STRING,
>>>> NUM_OCCURANCES INT,
>>>> UNIQUE_ID STRING,
>>>> EXEC_DATE STRING,
>>>> EXEC_DURATION FLOAT
>>>> )
>>>> PARTITIONED BY(EXEC_DATE STRING)
>>>> CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
>>>> ROW FORMAT DELIMITED
>>>> FIELDS TERMINATED BY ','
>>>> STORED AS TEXTFILE;
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *thanks, ronak*
>>>> *
>>>> *
>>>> *408 504 4847*
>>>> *My Blog : http://ronakbaps.posterous.com*
>>>> *
>>>> *
>>>> *
>>>> *
>>>>
>>>>
>>>
>>>
>>> --
>>> _________________________________________
>>> Dilip Antony Joseph
>>> http://csgrad.blogspot.com
>>> http://www.marydilip.info
>>>
>>
>>
>
>
> --
> _________________________________________
> Dilip Antony Joseph
> http://csgrad.blogspot.com
> http://www.marydilip.info
>

Re: question..partitioned table...

Posted by Dilip Joseph <di...@gmail.com>.
The partition column is automatically included as part of the table
definition, and hence can be used like any other column.

So you can issue queries like SELECT c1, c2, c3 from callback_parts WHERE
c2='foo'; for table defined as follows:

create table callbacks_part
(
 c1 INT,
 c3 STRING
)
PARTITIONED BY(c2 STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Dilip

On Fri, Aug 20, 2010 at 7:58 PM, Ronak Bhatt <ro...@gmail.com> wrote:

> I'm from oracle/sql background...the thinking there (in oracle) is -
>
> 1. one has a table with columns (c1, c2, c3)
> 2. define the partition on one of the columns (or combination of more than
> one) say c2
> 3. When data is inserted, it will be stored in right partition based on the
> value of c2.
> 4. when table is queries, if the WHERE predicate has partition column (c2
> in this case), optimizer will go to that partition without user taking any
> specific action (e.g. without user specifying the partition)
>
> whereas,
>
> In the scenario below that I mentioned, I've a text file that has one of
> the fields as exec_date ....after some more research, I also learned that in
> HIVE partition cannot be on existing column of defined table e.g. I've to
> have a column name that is not part of table..
>
> What I don't understand is that how do I write the queries so that I can
> utilize the partition if the partition column is not part of the table
> definition..
>
> Any inputs would be greatly appreciated
>
>
>
> *thanks, ronak*
> *
> *
> *408 504 4847*
> *My Blog : http://ronakbaps.posterous.com*
> *
> *
> *
> *
>
>
>
> On Fri, Aug 20, 2010 at 7:34 PM, Dilip Joseph <
> dilip.antony.joseph@gmail.com> wrote:
>
>> You don't need to specify "EXEC_DATE STRING," in your column definition as
>> it is already specified by "PARTITIONED BY(EXEC_DATE STRING)
>>
>> Dilip
>>
>>
>> On Fri, Aug 20, 2010 at 6:41 PM, Ronak Bhatt <ro...@gmail.com> wrote:
>>
>>> for the following create table, I'm getting the error : FAILED: Error in
>>> semantic analysis: Column repeated in partitioning columns...
>>>
>>> any suggestion?
>>>
>>> thanks, ronak
>>>
>>>
>>> create table callbacks_part
>>> (
>>> ID INT,
>>> BOOKMARK_STEP_ID INT,
>>> CLIENT_ID INT,
>>> TXN_ID STRING,
>>> SESSION_ID STRING,
>>> IP_ADDRESS STRING,
>>> METHOD STRING,
>>> PAGE_ID INT,
>>> RULE_NAME_ID INT,
>>> CALLBACK_TYPE_ID INT,
>>> PAGE_ELEMENT_ID INT,
>>> CALLBACK_SENSE_ID INT,
>>> CALLBACK_INFO STRING,
>>> NUM_OCCURANCES INT,
>>> UNIQUE_ID STRING,
>>> EXEC_DATE STRING,
>>> EXEC_DURATION FLOAT
>>> )
>>> PARTITIONED BY(EXEC_DATE STRING)
>>> CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
>>> ROW FORMAT DELIMITED
>>> FIELDS TERMINATED BY ','
>>> STORED AS TEXTFILE;
>>>
>>>
>>>
>>>
>>>
>>> *thanks, ronak*
>>> *
>>> *
>>> *408 504 4847*
>>> *My Blog : http://ronakbaps.posterous.com*
>>> *
>>> *
>>> *
>>> *
>>>
>>>
>>
>>
>> --
>> _________________________________________
>> Dilip Antony Joseph
>> http://csgrad.blogspot.com
>> http://www.marydilip.info
>>
>
>


-- 
_________________________________________
Dilip Antony Joseph
http://csgrad.blogspot.com
http://www.marydilip.info

Re: question..partitioned table...

Posted by Ronak Bhatt <ro...@gmail.com>.
I'm from oracle/sql background...the thinking there (in oracle) is -

1. one has a table with columns (c1, c2, c3)
2. define the partition on one of the columns (or combination of more than
one) say c2
3. When data is inserted, it will be stored in right partition based on the
value of c2.
4. when table is queries, if the WHERE predicate has partition column (c2 in
this case), optimizer will go to that partition without user taking any
specific action (e.g. without user specifying the partition)

whereas,

In the scenario below that I mentioned, I've a text file that has one of the
fields as exec_date ....after some more research, I also learned that in
HIVE partition cannot be on existing column of defined table e.g. I've to
have a column name that is not part of table..

What I don't understand is that how do I write the queries so that I can
utilize the partition if the partition column is not part of the table
definition..

Any inputs would be greatly appreciated



*thanks, ronak*
*
*
*408 504 4847*
*My Blog : http://ronakbaps.posterous.com*
*
*
*
*



On Fri, Aug 20, 2010 at 7:34 PM, Dilip Joseph <dilip.antony.joseph@gmail.com
> wrote:

> You don't need to specify "EXEC_DATE STRING," in your column definition as
> it is already specified by "PARTITIONED BY(EXEC_DATE STRING)
>
> Dilip
>
>
> On Fri, Aug 20, 2010 at 6:41 PM, Ronak Bhatt <ro...@gmail.com> wrote:
>
>> for the following create table, I'm getting the error : FAILED: Error in
>> semantic analysis: Column repeated in partitioning columns...
>>
>> any suggestion?
>>
>> thanks, ronak
>>
>>
>> create table callbacks_part
>> (
>> ID INT,
>> BOOKMARK_STEP_ID INT,
>> CLIENT_ID INT,
>> TXN_ID STRING,
>> SESSION_ID STRING,
>> IP_ADDRESS STRING,
>> METHOD STRING,
>> PAGE_ID INT,
>> RULE_NAME_ID INT,
>> CALLBACK_TYPE_ID INT,
>> PAGE_ELEMENT_ID INT,
>> CALLBACK_SENSE_ID INT,
>> CALLBACK_INFO STRING,
>> NUM_OCCURANCES INT,
>> UNIQUE_ID STRING,
>> EXEC_DATE STRING,
>> EXEC_DURATION FLOAT
>> )
>> PARTITIONED BY(EXEC_DATE STRING)
>> CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
>> ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY ','
>> STORED AS TEXTFILE;
>>
>>
>>
>>
>>
>> *thanks, ronak*
>> *
>> *
>> *408 504 4847*
>> *My Blog : http://ronakbaps.posterous.com*
>> *
>> *
>> *
>> *
>>
>>
>
>
> --
> _________________________________________
> Dilip Antony Joseph
> http://csgrad.blogspot.com
> http://www.marydilip.info
>

Re: question..partitioned table...

Posted by Dilip Joseph <di...@gmail.com>.
You don't need to specify "EXEC_DATE STRING," in your column definition as
it is already specified by "PARTITIONED BY(EXEC_DATE STRING)

Dilip

On Fri, Aug 20, 2010 at 6:41 PM, Ronak Bhatt <ro...@gmail.com> wrote:

> for the following create table, I'm getting the error : FAILED: Error in
> semantic analysis: Column repeated in partitioning columns...
>
> any suggestion?
>
> thanks, ronak
>
>
> create table callbacks_part
> (
> ID INT,
> BOOKMARK_STEP_ID INT,
> CLIENT_ID INT,
> TXN_ID STRING,
> SESSION_ID STRING,
> IP_ADDRESS STRING,
> METHOD STRING,
> PAGE_ID INT,
> RULE_NAME_ID INT,
> CALLBACK_TYPE_ID INT,
> PAGE_ELEMENT_ID INT,
> CALLBACK_SENSE_ID INT,
> CALLBACK_INFO STRING,
> NUM_OCCURANCES INT,
> UNIQUE_ID STRING,
> EXEC_DATE STRING,
> EXEC_DURATION FLOAT
> )
> PARTITIONED BY(EXEC_DATE STRING)
> CLUSTERED BY(PAGE_ID) SORTED BY(PAGE_ID) INTO 32 BUCKETS
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> STORED AS TEXTFILE;
>
>
>
>
>
> *thanks, ronak*
> *
> *
> *408 504 4847*
> *My Blog : http://ronakbaps.posterous.com*
> *
> *
> *
> *
>
>


-- 
_________________________________________
Dilip Antony Joseph
http://csgrad.blogspot.com
http://www.marydilip.info