You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by ya...@uwaterloo.ca on 2012/12/10 01:35:33 UTC

FROM INSERT after ADD COLUMN

Hello,

I couldn't find any example of how to populate columns that were added  
to a table. How would Hive tell which row to append by each value of  
the newly added columns? Does it do a column name matching?

Sincerely,
Younos

Re: FROM INSERT after ADD COLUMN

Posted by ya...@uwaterloo.ca.

Is there an index in the RC File to avoid a complete pass on the  
record "keys" for matching old and new records. Also, wouldn't the  
RCFile need to be rebuilt anyway, since the file actually stores  
blocks of n rows by m column achieving a certain block size? I haven't  
carefully read the RCFile paper, but that's what I understood by  
skimming through it.

-- Younos

Quoting Shreepadma Venugopalan <sh...@cloudera.com>:

> Sorry hit the send too soon :)
>
> While storing data in a column major format such as RCFile would help with
> adding new column data after executing an alter table...add columns
> statement, Hive doesn't provide a way to do it today. It is possible to do
> so outside of Hive today, but we would need to enhance Hive to add new
> column data when the data is stored in a column major format.
>
> Thanks.
> Shreepadma
>
>
> On Mon, Dec 10, 2012 at 10:32 AM, Shreepadma Venugopalan <
> shreepadma@cloudera.com> wrote:
>
>>
>>
>>
>> On Sun, Dec 9, 2012 at 10:32 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> I will reopen the subject a bit.
>>>
>>> I don't know the details of the RCFile implementation in Hive but if the
>>> data were stored that way it is theoretically possible to add the column
>>> data even without append and without rewriting the whole file. Does someone
>>> has more information on that matter?
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Mon, Dec 10, 2012 at 2:02 AM, <ya...@uwaterloo.ca> wrote:
>>>
>>>> Hello Shreepadma,
>>>>
>>>> That's definitely very helpful. I doubted that this would be the case,
>>>> but I was thinking that maybe there's a way to do it using a merge task. I
>>>> will change my data structure to make it a bit like HBase, and I hope Hive
>>>> would still be the right choice for me.. it can be backed by HBase anyway
>>>> :). Thank you very much, your quick reply saved me a lot of time!
>>>>
>>>> Sincerely,
>>>> Younos
>>>>
>>>>
>>>> Quoting Shreepadma Venugopalan <sh...@cloudera.com>:
>>>>
>>>>  Hi Younos,
>>>>>
>>>>> Since HiveQL doesn't support an insert..value statement, you can't
>>>>> insert
>>>>> values into a specific column. Let's assume your table had the following
>>>>> structure before the alter table..add columns statement was executed,
>>>>>
>>>>> tab (a string, b bigint, c double)
>>>>>
>>>>> Furthermore, let's assume that it had 100 rows. Now, let's assume you
>>>>> did
>>>>> an alter table tab add columns (d binary). The new table structure will
>>>>> look like below,
>>>>>
>>>>> tab (a string, b bigint, c double, d binary)
>>>>>
>>>>> You can't insert binary data into the 100 rows that were present prior
>>>>> to
>>>>> the alter table statement by executing a HiveQL statement. HiveQL
>>>>> doesn't
>>>>> support an insert..values statement like most RDBMSs. However, you can
>>>>> delete the existing files and add new files that contain records
>>>>> corresponding to the new table structure. Alternatively, you can skip
>>>>> the
>>>>> deletion step and just add new files that correspond to the new table
>>>>> structure. When you execute a HiveQL query, null will be returned for
>>>>> those
>>>>> columns for which the data doesn't exist.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Thanks.
>>>>> Shreepadma
>>>>>
>>>>>
>>>>> On Sun, Dec 9, 2012 at 4:35 PM, <ya...@uwaterloo.ca> wrote:
>>>>>
>>>>>  Hello,
>>>>>>
>>>>>> I couldn't find any example of how to populate columns that were added
>>>>>> to
>>>>>> a table. How would Hive tell which row to append by each value of the
>>>>>> newly
>>>>>> added columns? Does it do a column name matching?
>>>>>>
>>>>>> Sincerely,
>>>>>> Younos
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> Best regards,
>>>> Younos Aboulnaga
>>>>
>>>> Masters candidate
>>>> David Cheriton school of computer science
>>>> University of Waterloo
>>>> http://cs.uwaterloo.ca
>>>>
>>>> E-Mail: younos.aboulnaga@uwaterloo.ca
>>>> Mobile: +1 (519) 497-5669
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>



Best regards,
Younos Aboulnaga

Masters candidate
David Cheriton school of computer science
University of Waterloo
http://cs.uwaterloo.ca

E-Mail: younos.aboulnaga@uwaterloo.ca
Mobile: +1 (519) 497-5669

Re: FROM INSERT after ADD COLUMN

Posted by Shreepadma Venugopalan <sh...@cloudera.com>.

Sorry hit the send too soon :)

While storing data in a column major format such as RCFile would help with
adding new column data after executing an alter table...add columns
statement, Hive doesn't provide a way to do it today. It is possible to do
so outside of Hive today, but we would need to enhance Hive to add new
column data when the data is stored in a column major format.

Thanks.
Shreepadma


On Mon, Dec 10, 2012 at 10:32 AM, Shreepadma Venugopalan <
shreepadma@cloudera.com> wrote:

>
>
>
> On Sun, Dec 9, 2012 at 10:32 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> I will reopen the subject a bit.
>>
>> I don't know the details of the RCFile implementation in Hive but if the
>> data were stored that way it is theoretically possible to add the column
>> data even without append and without rewriting the whole file. Does someone
>> has more information on that matter?
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Mon, Dec 10, 2012 at 2:02 AM, <ya...@uwaterloo.ca> wrote:
>>
>>> Hello Shreepadma,
>>>
>>> That's definitely very helpful. I doubted that this would be the case,
>>> but I was thinking that maybe there's a way to do it using a merge task. I
>>> will change my data structure to make it a bit like HBase, and I hope Hive
>>> would still be the right choice for me.. it can be backed by HBase anyway
>>> :). Thank you very much, your quick reply saved me a lot of time!
>>>
>>> Sincerely,
>>> Younos
>>>
>>>
>>> Quoting Shreepadma Venugopalan <sh...@cloudera.com>:
>>>
>>>  Hi Younos,
>>>>
>>>> Since HiveQL doesn't support an insert..value statement, you can't
>>>> insert
>>>> values into a specific column. Let's assume your table had the following
>>>> structure before the alter table..add columns statement was executed,
>>>>
>>>> tab (a string, b bigint, c double)
>>>>
>>>> Furthermore, let's assume that it had 100 rows. Now, let's assume you
>>>> did
>>>> an alter table tab add columns (d binary). The new table structure will
>>>> look like below,
>>>>
>>>> tab (a string, b bigint, c double, d binary)
>>>>
>>>> You can't insert binary data into the 100 rows that were present prior
>>>> to
>>>> the alter table statement by executing a HiveQL statement. HiveQL
>>>> doesn't
>>>> support an insert..values statement like most RDBMSs. However, you can
>>>> delete the existing files and add new files that contain records
>>>> corresponding to the new table structure. Alternatively, you can skip
>>>> the
>>>> deletion step and just add new files that correspond to the new table
>>>> structure. When you execute a HiveQL query, null will be returned for
>>>> those
>>>> columns for which the data doesn't exist.
>>>>
>>>> Hope this helps.
>>>>
>>>> Thanks.
>>>> Shreepadma
>>>>
>>>>
>>>> On Sun, Dec 9, 2012 at 4:35 PM, <ya...@uwaterloo.ca> wrote:
>>>>
>>>>  Hello,
>>>>>
>>>>> I couldn't find any example of how to populate columns that were added
>>>>> to
>>>>> a table. How would Hive tell which row to append by each value of the
>>>>> newly
>>>>> added columns? Does it do a column name matching?
>>>>>
>>>>> Sincerely,
>>>>> Younos
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> Best regards,
>>> Younos Aboulnaga
>>>
>>> Masters candidate
>>> David Cheriton school of computer science
>>> University of Waterloo
>>> http://cs.uwaterloo.ca
>>>
>>> E-Mail: younos.aboulnaga@uwaterloo.ca
>>> Mobile: +1 (519) 497-5669
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>

Re: FROM INSERT after ADD COLUMN

Posted by Shreepadma Venugopalan <sh...@cloudera.com>.

On Sun, Dec 9, 2012 at 10:32 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> I will reopen the subject a bit.
>
> I don't know the details of the RCFile implementation in Hive but if the
> data were stored that way it is theoretically possible to add the column
> data even without append and without rewriting the whole file. Does someone
> has more information on that matter?
>
> Regards
>
> Bertrand
>
>
> On Mon, Dec 10, 2012 at 2:02 AM, <ya...@uwaterloo.ca> wrote:
>
>> Hello Shreepadma,
>>
>> That's definitely very helpful. I doubted that this would be the case,
>> but I was thinking that maybe there's a way to do it using a merge task. I
>> will change my data structure to make it a bit like HBase, and I hope Hive
>> would still be the right choice for me.. it can be backed by HBase anyway
>> :). Thank you very much, your quick reply saved me a lot of time!
>>
>> Sincerely,
>> Younos
>>
>>
>> Quoting Shreepadma Venugopalan <sh...@cloudera.com>:
>>
>>  Hi Younos,
>>>
>>> Since HiveQL doesn't support an insert..value statement, you can't insert
>>> values into a specific column. Let's assume your table had the following
>>> structure before the alter table..add columns statement was executed,
>>>
>>> tab (a string, b bigint, c double)
>>>
>>> Furthermore, let's assume that it had 100 rows. Now, let's assume you did
>>> an alter table tab add columns (d binary). The new table structure will
>>> look like below,
>>>
>>> tab (a string, b bigint, c double, d binary)
>>>
>>> You can't insert binary data into the 100 rows that were present prior to
>>> the alter table statement by executing a HiveQL statement. HiveQL doesn't
>>> support an insert..values statement like most RDBMSs. However, you can
>>> delete the existing files and add new files that contain records
>>> corresponding to the new table structure. Alternatively, you can skip the
>>> deletion step and just add new files that correspond to the new table
>>> structure. When you execute a HiveQL query, null will be returned for
>>> those
>>> columns for which the data doesn't exist.
>>>
>>> Hope this helps.
>>>
>>> Thanks.
>>> Shreepadma
>>>
>>>
>>> On Sun, Dec 9, 2012 at 4:35 PM, <ya...@uwaterloo.ca> wrote:
>>>
>>>  Hello,
>>>>
>>>> I couldn't find any example of how to populate columns that were added
>>>> to
>>>> a table. How would Hive tell which row to append by each value of the
>>>> newly
>>>> added columns? Does it do a column name matching?
>>>>
>>>> Sincerely,
>>>> Younos
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> Best regards,
>> Younos Aboulnaga
>>
>> Masters candidate
>> David Cheriton school of computer science
>> University of Waterloo
>> http://cs.uwaterloo.ca
>>
>> E-Mail: younos.aboulnaga@uwaterloo.ca
>> Mobile: +1 (519) 497-5669
>>
>>
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: FROM INSERT after ADD COLUMN

Posted by Bertrand Dechoux <de...@gmail.com>.

I will reopen the subject a bit.

I don't know the details of the RCFile implementation in Hive but if the
data were stored that way it is theoretically possible to add the column
data even without append and without rewriting the whole file. Does someone
has more information on that matter?

Regards

Bertrand

On Mon, Dec 10, 2012 at 2:02 AM, <ya...@uwaterloo.ca> wrote:

> Hello Shreepadma,
>
> That's definitely very helpful. I doubted that this would be the case, but
> I was thinking that maybe there's a way to do it using a merge task. I will
> change my data structure to make it a bit like HBase, and I hope Hive would
> still be the right choice for me.. it can be backed by HBase anyway :).
> Thank you very much, your quick reply saved me a lot of time!
>
> Sincerely,
> Younos
>
>
> Quoting Shreepadma Venugopalan <sh...@cloudera.com>:
>
>  Hi Younos,
>>
>> Since HiveQL doesn't support an insert..value statement, you can't insert
>> values into a specific column. Let's assume your table had the following
>> structure before the alter table..add columns statement was executed,
>>
>> tab (a string, b bigint, c double)
>>
>> Furthermore, let's assume that it had 100 rows. Now, let's assume you did
>> an alter table tab add columns (d binary). The new table structure will
>> look like below,
>>
>> tab (a string, b bigint, c double, d binary)
>>
>> You can't insert binary data into the 100 rows that were present prior to
>> the alter table statement by executing a HiveQL statement. HiveQL doesn't
>> support an insert..values statement like most RDBMSs. However, you can
>> delete the existing files and add new files that contain records
>> corresponding to the new table structure. Alternatively, you can skip the
>> deletion step and just add new files that correspond to the new table
>> structure. When you execute a HiveQL query, null will be returned for
>> those
>> columns for which the data doesn't exist.
>>
>> Hope this helps.
>>
>> Thanks.
>> Shreepadma
>>
>>
>> On Sun, Dec 9, 2012 at 4:35 PM, <ya...@uwaterloo.ca> wrote:
>>
>>  Hello,
>>>
>>> I couldn't find any example of how to populate columns that were added to
>>> a table. How would Hive tell which row to append by each value of the
>>> newly
>>> added columns? Does it do a column name matching?
>>>
>>> Sincerely,
>>> Younos
>>>
>>>
>>>
>>>
>>>
>>
>
>
> Best regards,
> Younos Aboulnaga
>
> Masters candidate
> David Cheriton school of computer science
> University of Waterloo
> http://cs.uwaterloo.ca
>
> E-Mail: younos.aboulnaga@uwaterloo.ca
> Mobile: +1 (519) 497-5669
>
>
>
>


-- 
Bertrand Dechoux

Re: FROM INSERT after ADD COLUMN

Posted by ya...@uwaterloo.ca.

Hello Shreepadma,

That's definitely very helpful. I doubted that this would be the case,  
but I was thinking that maybe there's a way to do it using a merge  
task. I will change my data structure to make it a bit like HBase, and  
I hope Hive would still be the right choice for me.. it can be backed  
by HBase anyway :). Thank you very much, your quick reply saved me a  
lot of time!

Sincerely,
Younos

Quoting Shreepadma Venugopalan <sh...@cloudera.com>:

> Hi Younos,
>
> Since HiveQL doesn't support an insert..value statement, you can't insert
> values into a specific column. Let's assume your table had the following
> structure before the alter table..add columns statement was executed,
>
> tab (a string, b bigint, c double)
>
> Furthermore, let's assume that it had 100 rows. Now, let's assume you did
> an alter table tab add columns (d binary). The new table structure will
> look like below,
>
> tab (a string, b bigint, c double, d binary)
>
> You can't insert binary data into the 100 rows that were present prior to
> the alter table statement by executing a HiveQL statement. HiveQL doesn't
> support an insert..values statement like most RDBMSs. However, you can
> delete the existing files and add new files that contain records
> corresponding to the new table structure. Alternatively, you can skip the
> deletion step and just add new files that correspond to the new table
> structure. When you execute a HiveQL query, null will be returned for those
> columns for which the data doesn't exist.
>
> Hope this helps.
>
> Thanks.
> Shreepadma
>
>
> On Sun, Dec 9, 2012 at 4:35 PM, <ya...@uwaterloo.ca> wrote:
>
>> Hello,
>>
>> I couldn't find any example of how to populate columns that were added to
>> a table. How would Hive tell which row to append by each value of the newly
>> added columns? Does it do a column name matching?
>>
>> Sincerely,
>> Younos
>>
>>
>>
>>
>



Best regards,
Younos Aboulnaga

Masters candidate
David Cheriton school of computer science
University of Waterloo
http://cs.uwaterloo.ca

E-Mail: younos.aboulnaga@uwaterloo.ca
Mobile: +1 (519) 497-5669

Re: FROM INSERT after ADD COLUMN

Posted by Shreepadma Venugopalan <sh...@cloudera.com>.

Hi Younos,

Since HiveQL doesn't support an insert..value statement, you can't insert
values into a specific column. Let's assume your table had the following
structure before the alter table..add columns statement was executed,

tab (a string, b bigint, c double)

Furthermore, let's assume that it had 100 rows. Now, let's assume you did
an alter table tab add columns (d binary). The new table structure will
look like below,

tab (a string, b bigint, c double, d binary)

You can't insert binary data into the 100 rows that were present prior to
the alter table statement by executing a HiveQL statement. HiveQL doesn't
support an insert..values statement like most RDBMSs. However, you can
delete the existing files and add new files that contain records
corresponding to the new table structure. Alternatively, you can skip the
deletion step and just add new files that correspond to the new table
structure. When you execute a HiveQL query, null will be returned for those
columns for which the data doesn't exist.

Hope this helps.

Thanks.
Shreepadma

On Sun, Dec 9, 2012 at 4:35 PM, <ya...@uwaterloo.ca> wrote:

> Hello,
>
> I couldn't find any example of how to populate columns that were added to
> a table. How would Hive tell which row to append by each value of the newly
> added columns? Does it do a column name matching?
>
> Sincerely,
> Younos
>
>
>
>

RE: FROM INSERT after ADD COLUMN

Posted by "Connell, Chuck" <Ch...@nuance.com>.

I don't think you can do this. Populating new columns is the same as "row level updates" which Hive does not do. AFAIK, your only option is to write a new table, by reading the old table, selecting all of it, appending new values to each row, then writing the longer rows to a new table. 

Chuck

________________________________________
From: yaboulna@uwaterloo.ca [yaboulna@uwaterloo.ca]
Sent: Sunday, December 09, 2012 7:35 PM
To: user@hive.apache.org
Subject: FROM INSERT after ADD COLUMN

Hello,

I couldn't find any example of how to populate columns that were added
to a table. How would Hive tell which row to append by each value of
the newly added columns? Does it do a column name matching?

Sincerely,
Younos