You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "Daniel,Wu" <ha...@163.com> on 2011/08/12 08:30:04 UTC

how to load data to partitioned table

  suppose the table is partitioned by period_key, and the csv file also has a column named as period_key. The csv file contains multiple days of data, how can we load it in the the table?

I think of an workaround by first load the data into a non-partition table, and then insert the data from non-partition table to the partition table.

hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.


However it doesn't work also. please help.

RE: how to load data to partitioned table

Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.

If you want to insert data into a partitioned table without specifying the partition value, you need to enable dynamic partitioning.
You can use the following switches:

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

Thanks
Vaibhav

From: Daniel,Wu [mailto:hadoop_wu@163.com]
Sent: Thursday, August 11, 2011 11:30 PM
To: hive
Subject: how to load data to partitioned table

  suppose the table is partitioned by period_key, and the csv file also has a column named as period_key. The csv file contains multiple days of data, how can we load it in the the table?

I think of an workaround by first load the data into a non-partition table, and then insert the data from non-partition table to the partition table.

hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.


However it doesn't work also. please help.

Re: how to load data to partitioned table

Posted by wd <wd...@wdicc.com>.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

2011/8/12 Daniel,Wu <ha...@163.com>

>   suppose the table is partitioned by period_key, and the csv file also has
> a column named as period_key. The csv file contains multiple days of data,
> how can we load it in the the table?
>
> I think of an workaround by first load the data into a non-partition table,
> and then insert the data from non-partition table to the partition table.
>
> hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
> FAILED: Error in semantic analysis: need to specify partition columns
> because the destination table is partitioned.
>
>
> However it doesn't work also. please help.
>
>
>

Re: how to load data to partitioned table

Posted by hadoopman <ha...@gmail.com>.

DISTRIBUTE BY and CLUSTER BY didn't resolve all the issues I've seen 
with very large data sets.  I mean I'm loading a couple terabytes in a 
dataset and running into some rather interesting problems.  I noticed 
however loading a couple months or two at a time (and making sure they 
were from the same time period) seem to resolve the problems I kept 
hitting over and over again.

I have to keep reminding myself that hive / hadoop isn't a database and 
not to treat it as such. :-)


On 08/14/2011 10:15 AM, bejoy_ks@yahoo.com wrote:
> Ya I very much agree with you on those lines. Using the basic stuff 
> would literally run into memory issues with large datasets. I had some 
> of those resolved by using the DISTRIBUTE BY clause and so. In short a 
> little work around over your hive queries could help you out in some 
> cases.
>
> Regards
> Bejoy K S
>
> ------------------------------------------------------------------------
> *From: * hadoopman <ha...@gmail.com>
> *Date: *Sun, 14 Aug 2011 08:57:12 -0600
> *To: *<us...@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: how to load data to partitioned table
>
> Something else I've noticed is when loading LOTS of historical data, 
> if you can try to say load a month of data at a time, try to just load 
> THAT month of data and only that month.  I've been able to load 
> several years of data (depending on the data) at a single load however 
> there have been times when loading a large dataset that I would run 
> into memory issues during the reduce phase (usually during 
> shuffle/sort).  Things from out of memory to stack overflow messages 
> (I've compiled a list of the more fun ones).
>
> Then I noticed that only loading data from say a single month loaded 
> quickly and without the memory headaches during the reduce.
>
> Something to keep in mind and it works great!
>
>
>
> On 08/12/2011 07:58 AM, bejoy_ks@yahoo.com wrote:
>> Hi Daniel
>> Just having a look at your requirement , to load data into a 
>> partition based hive table from any input file the most hassle free 
>> approach would be.
>> 1. Load the data into a non partitioned table that shares similar 
>> structure as the target table.
>> 2. Populate the target table with the data from non partitioned one 
>> using hive dynamic partition
>> approach.
>> With Dynamic partitions you don't need to manually identify the data 
>> partitions and distribute data accordingly.
>>
>> A similar implementation is described in the blog post
>> www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
>>
>> Hope it helps
>>
>> Regards
>> Bejoy K S
>>
>> ------------------------------------------------------------------------
>> *From: * Vikas Srivastava <vi...@one97.net>
>> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
>> *To: *<us...@hive.apache.org>
>> *ReplyTo: * user@hive.apache.org
>> *Subject: *Re: how to load data to partitioned table
>>
>> Hey ,
>>
>> Simpley you have run query like this
>>
>> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) 
>> SELECT *
>>
>>
>> Regards
>> Vikas Srivastava
>>
>>
>> 2011/8/12 Daniel,Wu <hadoop_wu@163.com <ma...@163.com>>
>>
>>       suppose the table is partitioned by period_key, and the csv
>>     file also has a column named as period_key. The csv file contains
>>     multiple days of data, how can we load it in the the table?
>>
>>     I think of an workaround by first load the data into a
>>     non-partition table, and then insert the data from non-partition
>>     table to the partition table.
>>
>>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>>     FAILED: Error in semantic analysis: need to specify partition
>>     columns because the destination table is partitioned.
>>
>>
>>     However it doesn't work also. please help.
>>
>>
>>
>>
>>
>> -- 
>> With Regards
>> Vikas Srivastava
>>
>> DWH & Analytics Team
>> Mob:+91 9560885900
>> One97 | Let's get talking !
>>
>

Re: how to load data to partitioned table

Posted by be...@yahoo.com.

Ya I very much agree with you on those lines. Using the basic stuff would literally run into memory issues  with large datasets. I had some of those resolved by using the DISTRIBUTE BY clause and so. In short a little work around over your hive queries could help you out in some cases.
Regards
Bejoy K S

-----Original Message-----
From: hadoopman <ha...@gmail.com>
Date: Sun, 14 Aug 2011 08:57:12 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: how to load data to partitioned table

Something else I've noticed is when loading LOTS of historical data, if 
you can try to say load a month of data at a time, try to just load THAT 
month of data and only that month.  I've been able to load several years 
of data (depending on the data) at a single load however there have been 
times when loading a large dataset that I would run into memory issues 
during the reduce phase (usually during shuffle/sort).  Things from out 
of memory to stack overflow messages (I've compiled a list of the more 
fun ones).

Then I noticed that only loading data from say a single month loaded 
quickly and without the memory headaches during the reduce.

Something to keep in mind and it works great!

On 08/12/2011 07:58 AM, bejoy_ks@yahoo.com wrote:
> Hi Daniel
> Just having a look at your requirement , to load data into a partition 
> based hive table from any input file the most hassle free approach 
> would be.
> 1. Load the data into a non partitioned table that shares similar 
> structure as the target table.
> 2. Populate the target table with the data from non partitioned one 
> using hive dynamic partition
> approach.
> With Dynamic partitions you don't need to manually identify the data 
> partitions and distribute data accordingly.
>
> A similar implementation is described in the blog post
> www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
>
> Hope it helps
>
> Regards
> Bejoy K S
>
> ------------------------------------------------------------------------
> *From: * Vikas Srivastava <vi...@one97.net>
> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
> *To: *<us...@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: how to load data to partitioned table
>
> Hey ,
>
> Simpley you have run query like this
>
> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) 
> SELECT *
>
>
> Regards
> Vikas Srivastava
>
>
> 2011/8/12 Daniel,Wu <hadoop_wu@163.com <ma...@163.com>>
>
>       suppose the table is partitioned by period_key, and the csv file
>     also has a column named as period_key. The csv file contains
>     multiple days of data, how can we load it in the the table?
>
>     I think of an workaround by first load the data into a
>     non-partition table, and then insert the data from non-partition
>     table to the partition table.
>
>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>     FAILED: Error in semantic analysis: need to specify partition
>     columns because the destination table is partitioned.
>
>
>     However it doesn't work also. please help.
>
>
>
>
>
> -- 
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>

Re: how to load data to partitioned table

Posted by hadoopman <ha...@gmail.com>.

Something else I've noticed is when loading LOTS of historical data, if 
you can try to say load a month of data at a time, try to just load THAT 
month of data and only that month.  I've been able to load several years 
of data (depending on the data) at a single load however there have been 
times when loading a large dataset that I would run into memory issues 
during the reduce phase (usually during shuffle/sort).  Things from out 
of memory to stack overflow messages (I've compiled a list of the more 
fun ones).

Then I noticed that only loading data from say a single month loaded 
quickly and without the memory headaches during the reduce.

Something to keep in mind and it works great!



On 08/12/2011 07:58 AM, bejoy_ks@yahoo.com wrote:
> Hi Daniel
> Just having a look at your requirement , to load data into a partition 
> based hive table from any input file the most hassle free approach 
> would be.
> 1. Load the data into a non partitioned table that shares similar 
> structure as the target table.
> 2. Populate the target table with the data from non partitioned one 
> using hive dynamic partition
> approach.
> With Dynamic partitions you don't need to manually identify the data 
> partitions and distribute data accordingly.
>
> A similar implementation is described in the blog post
> www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html
>
> Hope it helps
>
> Regards
> Bejoy K S
>
> ------------------------------------------------------------------------
> *From: * Vikas Srivastava <vi...@one97.net>
> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
> *To: *<us...@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: how to load data to partitioned table
>
> Hey ,
>
> Simpley you have run query like this
>
> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) 
> SELECT *
>
>
> Regards
> Vikas Srivastava
>
>
> 2011/8/12 Daniel,Wu <hadoop_wu@163.com <ma...@163.com>>
>
>       suppose the table is partitioned by period_key, and the csv file
>     also has a column named as period_key. The csv file contains
>     multiple days of data, how can we load it in the the table?
>
>     I think of an workaround by first load the data into a
>     non-partition table, and then insert the data from non-partition
>     table to the partition table.
>
>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>     FAILED: Error in semantic analysis: need to specify partition
>     columns because the destination table is partitioned.
>
>
>     However it doesn't work also. please help.
>
>
>
>
>
> -- 
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>

Re: how to load data to partitioned table

Posted by be...@yahoo.com.

Hi Daniel
      Just having a look at your requirement , to load data into a partition based hive table from any input file the most hassle free approach would be.
1.  Load the data into a non partitioned table that shares similar structure as the target table.
2. Populate the target table with the data from non partitioned one using hive dynamic partition
approach.
With Dynamic partitions you don't need to manually identify the data partitions and distribute data accordingly. 

A similar implementation is described in the blog post
www.kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

Hope it helps

Regards
Bejoy K S

-----Original Message-----
From: Vikas Srivastava <vi...@one97.net>
Date: Fri, 12 Aug 2011 17:31:28 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: how to load data to partitioned table

Hey ,

Simpley you have run query like this

FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) SELECT *


Regards
Vikas Srivastava


2011/8/12 Daniel,Wu <ha...@163.com>

>   suppose the table is partitioned by period_key, and the csv file also has
> a column named as period_key. The csv file contains multiple days of data,
> how can we load it in the the table?
>
> I think of an workaround by first load the data into a non-partition table,
> and then insert the data from non-partition table to the partition table.
>
> hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
> FAILED: Error in semantic analysis: need to specify partition columns
> because the destination table is partitioned.
>
>
> However it doesn't work also. please help.
>
>
>


-- 
With Regards
Vikas Srivastava

DWH & Analytics Team
Mob:+91 9560885900
One97 | Let's get talking !

Re: how to load data to partitioned table

Posted by Vikas Srivastava <vi...@one97.net>.

Hey ,

Simpley you have run query like this

FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) SELECT *


Regards
Vikas Srivastava


2011/8/12 Daniel,Wu <ha...@163.com>

>   suppose the table is partitioned by period_key, and the csv file also has
> a column named as period_key. The csv file contains multiple days of data,
> how can we load it in the the table?
>
> I think of an workaround by first load the data into a non-partition table,
> and then insert the data from non-partition table to the partition table.
>
> hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
> FAILED: Error in semantic analysis: need to specify partition columns
> because the destination table is partitioned.
>
>
> However it doesn't work also. please help.
>
>
>


-- 
With Regards
Vikas Srivastava

DWH & Analytics Team
Mob:+91 9560885900
One97 | Let's get talking !