You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mahender Sarangam <ma...@outlook.com> on 2016/05/14 22:29:02 UTC

Query Failing while querying on ORC Format

Hi,
We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to the ORC table. Now whenever we are running select statement for older month which has no columns( even though these columns are not part in select clause, (projection column) ), it is throwing exception.
 
There is JIRA bug for this kind of requirement has already been raised.
https://issues.apache.org/jira/browse/HIVE-11981
 
Can any one please tell me know alternative workaround for reading old previous columns of ORC partition table.
 
Thanks
 		 	   		  

Re: Query Failing while querying on ORC Format

Posted by Mich Talebzadeh <mi...@gmail.com>.
Try this sorry about cut and paste



http://search.gmane.org/?author=Maurin+Lenglart&sort=date | 10 Apr 06:06
2016

alter table add columns aternatives or hive refresh



Maurin Lenglart <
http://gmane.org/get-address.php?address=maurin%2dlH42%2fCN5sHOGw%2bnKnLezzg%40public.gmane.org
>

2016-04-10 04:06:44 GMT

Hi,

I am trying to add columns to table that I created with the “saveAsTable”
api.

I update the columns using sqlContext.sql(‘alter table myTable add columns
(mycol string)’).

The next time I create a df and save it in the same table, with the new
columns I get a :

“ParquetRelation

 requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE
statement generates the same number of columns as its schema.”



Also thise two commands don t return the same columns :

1. sqlContext.table(‘myTable’).schema.fields    <— wrong result

2. sqlContext.sql(’show columns in mytable’)  <—— good results



It seems to be a known bug :
https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs)



But I am wondering, how else can I update the columns or make sure that
spark take the new columns?



I already tried to refreshTable and to restart spark.



thanks



Top of Form

http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32484 |
http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user&followup=32484
|

Bottom of Form

http://

http://search.gmane.org/?author=Mich+Talebzadeh&sort=date | 10 Apr 12:41
2016



Re: alter table add columns aternatives or hive refresh



Mich Talebzadeh <
http://gmane.org/get-address.php?address=mich.talebzadeh%2dRe5JQEeQqe8AvxtiuMwx3w%40public.gmane.org
>

2016-04-10 10:41:59 GMT

I have not tried it on Spark but the column added in Hive to an existing
table cannot be updated for existing rows. In other words the new column is
set to null which does not require the change in the existing file length.



So basically as I understand when a  column is added to an already table.



1.    The metadata for the underlying table will be updated

2.    The new column will by default have null value

3.    The existing rows cannot have new column updated to a non null value

4.    New rows can have non null values set for the new column

5.    No sql operation can be done on that column. For example select *
from <TABLE> where new_column IS NOT NULL

6.    The easiest option is to create a new table with the new column and
do insert/select from the existing table with values set for the new column



HTH





Dr Mich Talebzadeh



LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com/





On 10 April 2016 at 05:06, Maurin Lenglart <mailto:maurin-lH42/
CN5sHOGw+nKnLezzg@public.gmane.org> wrote:

Hi,

I am trying to add columns to table that I created with the “saveAsTable”
api.

I update the columns using sqlContext.sql(‘alter table myTable add columns
(mycol string)’).

The next time I create a df and save it in the same table, with the new
columns I get a :

“ParquetRelation

 requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE
statement generates the same number of columns as its schema.”



Also thise two commands don t return the same columns :

1. sqlContext.table(‘myTable’).schema.fields    <— wrong result

2. sqlContext.sql(’show columns in mytable’)  <—— good results



It seems to be a known bug :
https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs)



But I am wondering, how else can I update the columns or make sure that
spark take the new columns?



I already tried to refreshTable and to restart spark.



thanks





Top of Form

http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32487 |
http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user&followup=32487
|

Bottom of Form

http://

http://search.gmane.org/?author=Maurin+Lenglart&sort=date | 10 Apr 20:34
2016

Re: alter table add columns aternatives or hive refresh



Maurin Lenglart <
http://gmane.org/get-address.php?address=maurin%2dlH42%2fCN5sHOGw%2bnKnLezzg%40public.gmane.org
>

2016-04-10 18:34:08 GMT

Hi,

So basically you are telling me that I need to recreate a table, and
re-insert everything every time  I update a column?

I understand the constraints, but that solution doesn’t look good to me. I
am updating the schema everyday and the table is a couple of TB of data.



Do you see any other options that will allow me not to move TB of data
everyday?



Thanks for you answer



From: Mich Talebzadeh <mailto:mich.talebzadeh%20%3Cat%3E%20gmail.com>

Date: Sunday, April 10, 2016 at 3:41 AM

To: maurin lenglart <mailto:maurin%20%3Cat%3E%20cuberonlabs.com>

Cc: "mailto:user%20%3Cat%3E%20spark.apache.org" <mailto:
user%20%3Cat%3E%20spark.apache.org>

Subject: Re: alter table add columns aternatives or hive refresh



I have not tried it on Spark but the column added in Hive to an existing
table cannot be updated for existing rows. In other words the new column is
set to null which does not require the change in the existing file length.



So basically as I understand when a  column is added to an already table.



1.    The metadata for the underlying table will be updated

2.    The new column will by default have null value

3.    The existing rows cannot have new column updated to a non null value

4.    New rows can have non null values set for the new column

5.    No sql operation can be done on that column. For example select *
from <TABLE> where new_column IS NOT NULL

6.    The easiest option is to create a new table with the new column and
do insert/select from the existing table with values set for the new column



HTH





Dr Mich Talebzadeh



LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com/





On 10 April 2016 at 05:06, Maurin Lenglart < mailto:
maurin%20%3Cat%3E%20cuberonlabs.com> wrote:

Hi,

I am trying to add columns to table that I created with the “saveAsTable”
api.

I update the columns using sqlContext.sql(‘alter table myTable add columns
(mycol string)’).

The next time I create a df and save it in the same table, with the new
columns I get a :

“ParquetRelation

 requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE
statement generates the same number of columns as its schema.”



Also thise two commands don t return the same columns :

1. sqlContext.table(‘myTable’).schema.fields    <— wrong result

2. sqlContext.sql(’show columns in mytable’)  <—— good results



It seems to be a known bug :
https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs)



But I am wondering, how else can I update the columns or make sure that
spark take the new columns?



I already tried to refreshTable and to restart spark.



thanks





Top of Form

http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/32501 |
http://post.gmane.org/post.php?group=gmane.comp.lang.scala.spark.user&followup=32501
|

Bottom of Form

http://

http://search.gmane.org/?author=Mich+Talebzadeh&sort=date | 10 Apr 21:25
2016



Re: alter table add columns aternatives or hive refresh

Mich Talebzadeh <
http://gmane.org/get-address.php?address=mich.talebzadeh%2dRe5JQEeQqe8AvxtiuMwx3w%40public.gmane.org
>

2016-04-10 19:25:04 GMT

Hi,



I am confining myself to Hive tables. As I stated it before I have not
tried it in Spark. So I stand corrected.



Let us try this simple test in Hive





-- Create table

hive> create table testme(col1 int);

OK

--insert a row

hive> insert into testme values(1);



Loading data to table test.testme

OK

-- Add a new column to testme

hive> alter table testme add columns (new_col varchar(30));

OK

Time taken: 0.055 seconds

-- Expect one row here

hive> select * from testme;

OK

1       NULL

-- Add a new row including values for new_col. This should work

hive> insert into testme values(1,'London');

Loading data to table test.testme

OK

hive> select * from testme;

OK

1       NULL

1       London

Time taken: 0.074 seconds, Fetched: 2 row(s)

-- Now update the new column

hive> update testme set col2 = 'NY';

FAILED: SemanticException [Error 10297]: Attempt to do update or delete on
table test.testme that does not use an AcidOutputFormat or is not bucketed

So this is Hive. You can add new rows including values for the new
column but cannot update the null values. Will this work for you?



HTH





Dr Mich Talebzadeh



LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com/





On 10 April 2016 at 19:34, Maurin Lenglart <mailto:maurin-lH42/
CN5sHOGw+nKnLezzg@public.gmane.org> wrote:

Hi,

So basically you are telling me that I need to recreate a table, and
re-insert everything every time  I update a column?

I understand the constraints, but that solution doesn’t look good to me. I
am updating the schema everyday and the table is a couple of TB of data.



Do you see any other options that will allow me not to move TB of data
everyday?



Thanks for you answer



From: Mich Talebzadeh <mailto:
mich.talebzadeh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Date: Sunday, April 10, 2016 at 3:41 AM

To: maurin lenglart <ma...@public.gmane.org>

Cc: "mailto:user%20%3Cat%3E%20spark.apache.org" <mailto:
user-NQCDCRQ5me8yzMRdD/IqWQ@public.gmane.org>

Subject: Re: alter table add columns aternatives or hive refresh



I have not tried it on Spark but the column added in Hive to an existing
table cannot be updated for existing rows. In other words the new column is
set to null which does not require the change in the existing file length.



So basically as I understand when a  column is added to an already table.



1.    The metadata for the underlying table will be updated

2.    The new column will by default have null value

3.    The existing rows cannot have new column updated to a non null value

4.    New rows can have non null values set for the new column

5.    No sql operation can be done on that column. For example select *
from <TABLE> where new_column IS NOT NULL

6.    The easiest option is to create a new table with the new column and
do insert/select from the existing table with values set for the new column



HTH





Dr Mich Talebzadeh



LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com/





On 10 April 2016 at 05:06, Maurin Lenglart < mailto:maurin-lH42/
CN5sHOGw+nKnLezzg@public.gmane.org> wrote:

Hi,

I am trying to add columns to table that I created with the “saveAsTable”
api.

I update the columns using sqlContext.sql(‘alter table myTable add columns
(mycol string)’).

The next time I create a df and save it in the same table, with the new
columns I get a :

“ParquetRelation

 requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE
statement generates the same number of columns as its schema.”



Also thise two commands don t return the same columns :

1. sqlContext.table(‘myTable’).schema.fields    <— wrong result

2. sqlContext.sql(’show columns in mytable’)  <—— good results



It seems to be a known bug :
https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs)



But I am wondering, how else can I update the columns or make sure that
spark take the new columns?



I already tried to refreshTable and to restart spark.



thanks



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 May 2016 at 20:02, Matthew McCline <mm...@hortonworks.com> wrote:

> ​
>
> What version of Hive are you on?
>
>
> ------------------------------
> *From:* Mahender Sarangam <ma...@outlook.com>
> *Sent:* Saturday, May 14, 2016 3:29 PM
> *To:* user@hive.apache.org
> *Subject:* Query Failing while querying on ORC Format
>
> Hi,
> We are dumping our data into ORC Partition Bucketed table. We have loaded
> almost 6 months data and here month is Partition by column. Now we have
> modified ORC partition bucketed table schema. We have added 2 more columns
> to the ORC table. Now whenever we are running select statement for older
> month which has no columns( even though these columns are not part in
> select clause, (projection column) ), it is throwing exception.
>
> There is JIRA bug for this kind of requirement has already been raised.
> https://issues.apache.org/jira/browse/HIVE-11981
>
> Can any one please tell me know alternative workaround for reading old
> previous columns of ORC partition table.
>
> Thanks
>

Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
Thanks Mich.


On 5/17/2016 1:31 PM, Mich Talebzadeh wrote:
> I am afraid AFAIK the old partitions cannot be modified as they are 
> fixed in size. That is the existing partition file.
>
> I agree this is very tedious. We should come up with a more flexible 
> design for ORC tables.
>
> HTH
>
> Dr Mich Talebzadeh
>
> LinkedIn 
> /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 17 May 2016 at 21:20, mahender bigdata 
> <Mahender.BigData@outlook.com <ma...@outlook.com>> 
> wrote:
>
>     Hi Jorn,
>
>     You mean to say try to add additional columns for the old
>     partitions where column are missing ?
>
>
>
>     On 5/17/2016 7:25 AM, Jörn Franke wrote:
>>     I do not remember exactly, but I think it worked simply by adding
>>     a new partition to the old table with the additional columns.
>>
>>     On 17 May 2016, at 15:00, Mich Talebzadeh
>>     <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>>
>>>     Hi Mahendar,
>>>
>>>     That version 1.2 is reasonable.
>>>
>>>     One alternative is to create a new table (new_table) in Hive
>>>     with columns from old_table plus the added column new_column as
>>>     ORC etc
>>>
>>>     Do an INSERT/SELECT from old_table to new_table
>>>
>>>     INSERT INTO new_table
>>>     SELECT *, <value_for_new_column) from old_table
>>>     ALTER old_table RENAME to old_table_KEEP
>>>     RENAME new_table TO ol_table
>>>
>>>     That should work. Check the syntax.
>>>
>>>     HTH
>>>
>>>     Dr Mich Talebzadeh
>>>
>>>     LinkedIn
>>>     /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>>>
>>>     http://talebzadehmich.wordpress.com
>>>
>>>
>>>     On 16 May 2016 at 23:53, mahender bigdata
>>>     <Mahender.BigData@outlook.com
>>>     <ma...@outlook.com>> wrote:
>>>
>>>         I'm on Hive 1.2
>>>
>>>
>>>         On 5/16/2016 12:02 PM, Matthew McCline wrote:
>>>>
>>>>         ​
>>>>
>>>>         What version of Hive are you on?
>>>>
>>>>
>>>>         ------------------------------------------------------------------------
>>>>         *From:* Mahender Sarangam <ma...@outlook.com>
>>>>         <ma...@outlook.com>
>>>>         *Sent:* Saturday, May 14, 2016 3:29 PM
>>>>         *To:* user@hive.apache.org <ma...@hive.apache.org>
>>>>         *Subject:* Query Failing while querying on ORC Format
>>>>         Hi,
>>>>         We are dumping our data into ORC Partition Bucketed table.
>>>>         We have loaded almost 6 months data and here month is
>>>>         Partition by column. Now we have modified ORC partition
>>>>         bucketed table schema. We have added 2 more columns to
>>>>         the ORC table. Now whenever we are running select statement
>>>>         for older month which has no columns( even though these
>>>>         columns are not part in select clause, (projection
>>>>         column) ), it is throwing exception.
>>>>
>>>>         There is JIRA bug for this kind of requirement has already
>>>>         been raised.
>>>>         https://issues.apache.org/jira/browse/HIVE-11981
>>>>
>>>>         Can any one please tell me know alternative workaround for
>>>>         reading old previous columns of ORC partition table.
>>>>
>>>>         Thanks
>>>
>>>
>
>


Re: Query Failing while querying on ORC Format

Posted by Mich Talebzadeh <mi...@gmail.com>.
I am afraid AFAIK the old partitions cannot be modified as they are fixed
in size. That is the existing partition file.

I agree this is very tedious. We should come up with a more flexible design
for ORC tables.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 May 2016 at 21:20, mahender bigdata <Ma...@outlook.com>
wrote:

> Hi Jorn,
>
> You mean to say try to add additional columns for the old partitions where
> column are missing ?
>
>
>
> On 5/17/2016 7:25 AM, Jörn Franke wrote:
>
> I do not remember exactly, but I think it worked simply by adding a new
> partition to the old table with the additional columns.
>
> On 17 May 2016, at 15:00, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> Hi Mahendar,
>
> That version 1.2 is reasonable.
>
> One alternative is to create a new table (new_table) in Hive with columns
> from old_table plus the added column new_column as ORC etc
>
> Do an INSERT/SELECT from old_table to new_table
>
> INSERT INTO new_table
> SELECT *, <value_for_new_column) from old_table
> ALTER old_table RENAME to old_table_KEEP
> RENAME new_table TO ol_table
>
> That should work. Check the syntax.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> <http://talebzadehmich.wordpress.com/>http://talebzadehmich.wordpress.com
>
>
>
> On 16 May 2016 at 23:53, mahender bigdata <Ma...@outlook.com>
> wrote:
>
>> I'm on Hive 1.2
>>
>> On 5/16/2016 12:02 PM, Matthew McCline wrote:
>>
>> ​
>>
>> What version of Hive are you on?
>>
>>
>> ------------------------------
>> *From:* Mahender Sarangam <ma...@outlook.com>
>> <ma...@outlook.com>
>> *Sent:* Saturday, May 14, 2016 3:29 PM
>> *To:* user@hive.apache.org
>> *Subject:* Query Failing while querying on ORC Format
>>
>> Hi,
>> We are dumping our data into ORC Partition Bucketed table. We have loaded
>> almost 6 months data and here month is Partition by column. Now we have
>> modified ORC partition bucketed table schema. We have added 2 more columns
>> to the ORC table. Now whenever we are running select statement for older
>> month which has no columns( even though these columns are not part in
>> select clause, (projection column) ), it is throwing exception.
>>
>> There is JIRA bug for this kind of requirement has already been raised.
>> https://issues.apache.org/jira/browse/HIVE-11981
>>
>> Can any one please tell me know alternative workaround for reading old
>> previous columns of ORC partition table.
>>
>> Thanks
>>
>>
>>
>
>

Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
Hi Jorn,

You mean to say try to add additional columns for the old partitions 
where column are missing ?



On 5/17/2016 7:25 AM, Jörn Franke wrote:
> I do not remember exactly, but I think it worked simply by adding a 
> new partition to the old table with the additional columns.
>
> On 17 May 2016, at 15:00, Mich Talebzadeh <mich.talebzadeh@gmail.com 
> <ma...@gmail.com>> wrote:
>
>> Hi Mahendar,
>>
>> That version 1.2 is reasonable.
>>
>> One alternative is to create a new table (new_table) in Hive with 
>> columns from old_table plus the added column new_column as ORC etc
>>
>> Do an INSERT/SELECT from old_table to new_table
>>
>> INSERT INTO new_table
>> SELECT *, <value_for_new_column) from old_table
>> ALTER old_table RENAME to old_table_KEEP
>> RENAME new_table TO ol_table
>>
>> That should work. Check the syntax.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn 
>> /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>>
>> http://talebzadehmich.wordpress.com 
>> <http://talebzadehmich.wordpress.com/>
>>
>>
>> On 16 May 2016 at 23:53, mahender bigdata 
>> <Mahender.BigData@outlook.com <ma...@outlook.com>> 
>> wrote:
>>
>>     I'm on Hive 1.2
>>
>>
>>     On 5/16/2016 12:02 PM, Matthew McCline wrote:
>>>
>>>     ​
>>>
>>>     What version of Hive are you on?
>>>
>>>
>>>     ------------------------------------------------------------------------
>>>     *From:* Mahender Sarangam <ma...@outlook.com>
>>>     <ma...@outlook.com>
>>>     *Sent:* Saturday, May 14, 2016 3:29 PM
>>>     *To:* user@hive.apache.org <ma...@hive.apache.org>
>>>     *Subject:* Query Failing while querying on ORC Format
>>>     Hi,
>>>     We are dumping our data into ORC Partition Bucketed table. We
>>>     have loaded almost 6 months data and here month is Partition by
>>>     column. Now we have modified ORC partition bucketed table
>>>     schema. We have added 2 more columns to the ORC table. Now
>>>     whenever we are running select statement for older month which
>>>     has no columns( even though these columns are not part in select
>>>     clause, (projection column) ), it is throwing exception.
>>>
>>>     There is JIRA bug for this kind of requirement has already been
>>>     raised.
>>>     https://issues.apache.org/jira/browse/HIVE-11981
>>>
>>>     Can any one please tell me know alternative workaround for
>>>     reading old previous columns of ORC partition table.
>>>
>>>     Thanks
>>
>>


Re: Query Failing while querying on ORC Format

Posted by Jörn Franke <jo...@gmail.com>.
I do not remember exactly, but I think it worked simply by adding a new partition to the old table with the additional columns.

> On 17 May 2016, at 15:00, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Hi Mahendar,
> 
> That version 1.2 is reasonable.
> 
> One alternative is to create a new table (new_table) in Hive with columns from old_table plus the added column new_column as ORC etc
> 
> Do an INSERT/SELECT from old_table to new_table
> 
> INSERT INTO new_table
> SELECT *, <value_for_new_column) from old_table
> ALTER old_table RENAME to old_table_KEEP
> RENAME new_table TO ol_table
> 
> That should work. Check the syntax.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 16 May 2016 at 23:53, mahender bigdata <Ma...@outlook.com> wrote:
>> I'm on Hive 1.2
>> 
>>> On 5/16/2016 12:02 PM, Matthew McCline       wrote:
>>> ​
>>> What version of Hive are you on?
>>> 
>>> From: Mahender Sarangam <ma...@outlook.com>
>>> Sent: Saturday, May 14, 2016 3:29 PM
>>> To: user@hive.apache.org
>>> Subject: Query Failing while querying on ORC Format
>>>  
>>> Hi,
>>> We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to the ORC table. Now whenever we are running select statement for older month which has no columns( even though these             columns are not part in select clause, (projection column) ), it is throwing exception.
>>>  
>>> There is JIRA bug for this kind of requirement has already been raised.
>>> https://issues.apache.org/jira/browse/HIVE-11981
>>>  
>>> Can any one please tell me know alternative workaround for reading old previous columns of ORC partition table.
>>>  
>>> Thanks
> 

Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
Thanks Mich. Reloading will always be there.  My only worry point is, we 
need to move 10 years of data and just looking for workaround without 
reloading.

/Mahender


On 5/17/2016 6:00 AM, Mich Talebzadeh wrote:
> Hi Mahendar,
>
> That version 1.2 is reasonable.
>
> One alternative is to create a new table (new_table) in Hive with 
> columns from old_table plus the added column new_column as ORC etc
>
> Do an INSERT/SELECT from old_table to new_table
>
> INSERT INTO new_table
> SELECT *, <value_for_new_column) from old_table
> ALTER old_table RENAME to old_table_KEEP
> RENAME new_table TO ol_table
>
> That should work. Check the syntax.
>
> HTH
>
> Dr Mich Talebzadeh
>
> LinkedIn 
> /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 16 May 2016 at 23:53, mahender bigdata 
> <Mahender.BigData@outlook.com <ma...@outlook.com>> 
> wrote:
>
>     I'm on Hive 1.2
>
>
>     On 5/16/2016 12:02 PM, Matthew McCline wrote:
>>
>>     ​
>>
>>     What version of Hive are you on?
>>
>>
>>     ------------------------------------------------------------------------
>>     *From:* Mahender Sarangam <ma...@outlook.com>
>>     <ma...@outlook.com>
>>     *Sent:* Saturday, May 14, 2016 3:29 PM
>>     *To:* user@hive.apache.org <ma...@hive.apache.org>
>>     *Subject:* Query Failing while querying on ORC Format
>>     Hi,
>>     We are dumping our data into ORC Partition Bucketed table. We
>>     have loaded almost 6 months data and here month is Partition by
>>     column. Now we have modified ORC partition bucketed table schema.
>>     We have added 2 more columns to the ORC table. Now whenever we
>>     are running select statement for older month which has no
>>     columns( even though these columns are not part in select clause,
>>     (projection column) ), it is throwing exception.
>>
>>     There is JIRA bug for this kind of requirement has already been
>>     raised.
>>     https://issues.apache.org/jira/browse/HIVE-11981
>>
>>     Can any one please tell me know alternative workaround for
>>     reading old previous columns of ORC partition table.
>>
>>     Thanks
>
>


Re: Query Failing while querying on ORC Format

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Mahendar,

That version 1.2 is reasonable.

One alternative is to create a new table (new_table) in Hive with columns
from old_table plus the added column new_column as ORC etc

Do an INSERT/SELECT from old_table to new_table

INSERT INTO new_table
SELECT *, <value_for_new_column) from old_table
ALTER old_table RENAME to old_table_KEEP
RENAME new_table TO ol_table

That should work. Check the syntax.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 May 2016 at 23:53, mahender bigdata <Ma...@outlook.com>
wrote:

> I'm on Hive 1.2
>
> On 5/16/2016 12:02 PM, Matthew McCline wrote:
>
> ​
>
> What version of Hive are you on?
>
>
> ------------------------------
> *From:* Mahender Sarangam <ma...@outlook.com>
> <ma...@outlook.com>
> *Sent:* Saturday, May 14, 2016 3:29 PM
> *To:* user@hive.apache.org
> *Subject:* Query Failing while querying on ORC Format
>
> Hi,
> We are dumping our data into ORC Partition Bucketed table. We have loaded
> almost 6 months data and here month is Partition by column. Now we have
> modified ORC partition bucketed table schema. We have added 2 more columns
> to the ORC table. Now whenever we are running select statement for older
> month which has no columns( even though these columns are not part in
> select clause, (projection column) ), it is throwing exception.
>
> There is JIRA bug for this kind of requirement has already been raised.
> https://issues.apache.org/jira/browse/HIVE-11981
>
> Can any one please tell me know alternative workaround for reading old
> previous columns of ORC partition table.
>
> Thanks
>
>
>

Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
I'm on Hive 1.2


On 5/16/2016 12:02 PM, Matthew McCline wrote:
>
> ​
>
> What version of Hive are you on?
>
>
> ------------------------------------------------------------------------
> *From:* Mahender Sarangam <ma...@outlook.com>
> *Sent:* Saturday, May 14, 2016 3:29 PM
> *To:* user@hive.apache.org
> *Subject:* Query Failing while querying on ORC Format
> Hi,
> We are dumping our data into ORC Partition Bucketed table. We have 
> loaded almost 6 months data and here month is Partition by column. Now 
> we have modified ORC partition bucketed table schema. We have added 2 
> more columns to the ORC table. Now whenever we are running select 
> statement for older month which has no columns( even though these 
> columns are not part in select clause, (projection column) ), it is 
> throwing exception.
>
> There is JIRA bug for this kind of requirement has already been raised.
> https://issues.apache.org/jira/browse/HIVE-11981
>
> Can any one please tell me know alternative workaround for reading old 
> previous columns of ORC partition table.
>
> Thanks


Re: Query Failing while querying on ORC Format

Posted by Matthew McCline <mm...@hortonworks.com>.
?

What version of Hive are you on?


________________________________
From: Mahender Sarangam <ma...@outlook.com>
Sent: Saturday, May 14, 2016 3:29 PM
To: user@hive.apache.org
Subject: Query Failing while querying on ORC Format

Hi,
We are dumping our data into ORC Partition Bucketed table. We have loaded almost 6 months data and here month is Partition by column. Now we have modified ORC partition bucketed table schema. We have added 2 more columns to the ORC table. Now whenever we are running select statement for older month which has no columns( even though these columns are not part in select clause, (projection column) ), it is throwing exception.

There is JIRA bug for this kind of requirement has already been raised.
https://issues.apache.org/jira/browse/HIVE-11981

Can any one please tell me know alternative workaround for reading old previous columns of ORC partition table.

Thanks

Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
Hi Mich,

sorry. Link is not pointing to right location.



On 5/15/2016 1:25 PM, Mich Talebzadeh wrote:
> Hi Mahender,
>
> Please check this thread
>
> https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2
>
> HTH
>
> Dr Mich Talebzadeh
>
> LinkedIn 
> /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 15 May 2016 at 20:38, mahender bigdata 
> <Mahender.BigData@outlook.com <ma...@outlook.com>> 
> wrote:
>
>     Hi Mich,
>
>     Is there any link missing ?.
>
>     We have already added column. Some how the old partition data with
>     new column is not failing to retrieving.
>
>     /mahens
>
>     On 5/14/2016 4:15 PM, Mich Talebzadeh wrote:
>
>         that night help
>
>
>


Re: Query Failing while querying on ORC Format

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Mahender,

Please check this thread

https://mail.google.com/mail/#search/alter+table+add+columns+aternatives+or+hive+refresh/153fe59e7c2970b2

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 May 2016 at 20:38, mahender bigdata <Ma...@outlook.com>
wrote:

> Hi Mich,
>
> Is there any link missing ?.
>
> We have already added column. Some how the old partition data with new
> column is not failing to retrieving.
>
> /mahens
>
> On 5/14/2016 4:15 PM, Mich Talebzadeh wrote:
>
>> that night help
>>
>
>

Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
Hi Mich,

Is there any link missing ?.

We have already added column. Some how the old partition data with new 
column is not failing to retrieving.

/mahens

On 5/14/2016 4:15 PM, Mich Talebzadeh wrote:
> that night help


Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
For Temporary, I'm disabling vectorization on ORC table. then it is working.



On 5/15/2016 3:38 PM, mahender bigdata wrote:
> here is the error message
>> https://issues.apache.org/jira/browse/HIVE-10598
>>
>> Error: java.lang.RuntimeException: Error creating a batch at 
>> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) 
>> at 
>> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:52) 
>> at 
>> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:84) 
>> at 
>> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:42) 
>> at 
>> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:155) 
>> at 
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180) 
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at 
>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at 
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at 
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at 
>> java.security.AccessController.doPrivileged(Native Method) at 
>> javax.security.auth.Subject.doAs(Subject.java:415) at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused 
>> by:
>>
>> org.apache.hadoop.hive.ql.metadata.HiveException: No type entry found 
>> for column 27 in map {28=String} at 
>> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:632) 
>>
>>
>> at 
>> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:343) 
>> at 
>> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:112) 
>> ... 13 more
>>
> here is the another URL : 
> https://issues.apache.org/jira/browse/HIVE-10598
>>
>> On 5/15/2016 12:38 PM, mahender bigdata wrote:
>>> Hi Mich,
>>>
>>> Is there any link missing ?.
>>>
>>> We have already added column. Some how the old partition data with 
>>> new column is not failing to retrieving.
>>>
>>> /mahens
>>>
>>> On 5/14/2016 4:15 PM, Mich Talebzadeh wrote:
>>>> that night help
>>>
>>
>


Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
here is the error message
> https://issues.apache.org/jira/browse/HIVE-10598
>
> Error: java.lang.RuntimeException: Error creating a batch at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) 
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:52) 
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:84) 
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:42) 
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:155) 
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180) 
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:
>
> org.apache.hadoop.hive.ql.metadata.HiveException: No type entry found 
> for column 27 in map {28=String} at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:632) 
>
>
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:343) 
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:112) 
> ... 13 more
>
here is the another URL : https://issues.apache.org/jira/browse/HIVE-10598
>
> On 5/15/2016 12:38 PM, mahender bigdata wrote:
>> Hi Mich,
>>
>> Is there any link missing ?.
>>
>> We have already added column. Some how the old partition data with 
>> new column is not failing to retrieving.
>>
>> /mahens
>>
>> On 5/14/2016 4:15 PM, Mich Talebzadeh wrote:
>>> that night help
>>
>


Re: Query Failing while querying on ORC Format

Posted by mahender bigdata <Ma...@outlook.com>.
here is the error message


Error: java.lang.RuntimeException: Error creating a batch at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114) 
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:52) 
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:84) 
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:42) 
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:155) 
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:

org.apache.hadoop.hive.ql.metadata.HiveException: No type entry found 
for column 27 in map {28=String} at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:632) 


at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:343) 
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:112) 
... 13 more


On 5/15/2016 12:38 PM, mahender bigdata wrote:
> Hi Mich,
>
> Is there any link missing ?.
>
> We have already added column. Some how the old partition data with new 
> column is not failing to retrieving.
>
> /mahens
>
> On 5/14/2016 4:15 PM, Mich Talebzadeh wrote:
>> that night help
>


Re: Query Failing while querying on ORC Format

Posted by Mich Talebzadeh <mi...@gmail.com>.
check this thread.

alter table add columns aternatives or hive refresh

that night help

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 14 May 2016 at 23:29, Mahender Sarangam <ma...@outlook.com>
wrote:

> Hi,
> We are dumping our data into ORC Partition Bucketed table. We have loaded
> almost 6 months data and here month is Partition by column. Now we have
> modified ORC partition bucketed table schema. We have added 2 more columns
> to the ORC table. Now whenever we are running select statement for older
> month which has no columns( even though these columns are not part in
> select clause, (projection column) ), it is throwing exception.
>
> There is JIRA bug for this kind of requirement has already been raised.
> https://issues.apache.org/jira/browse/HIVE-11981
>
> Can any one please tell me know alternative workaround for reading old
> previous columns of ORC partition table.
>
> Thanks
>