You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by "FuWH (傅万红/苏州)" <fu...@paxsz.com> on 2019/09/03 07:57:08 UTC

Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Hi,

We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed ORC table, everything is okay but when querying the columns added by HiveQL “alert table … add columns …”, Drill always returns the query result as NULL for such columns, HiveQL can query with expected results, can anyone help to give an adivse on how to solve this issue?

Appreciated & Thanks
Jerry

Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Posted by WanHong Fu <wa...@gmail.com>.

Hi Igor,

Thanks for reply.
Your workaround need to touch Hive metastore but it is impossible for us so
far.
May I know is there a plan to fix this issue from Drill side as we
are eager to get it work.

Thanks
Jerry

Igor Guzenko <ih...@gmail.com> 于2019年9月4日周三 下午4:45写道：

> Hello Jerry,
>
> Thank you very much for detailed steps. The issue is pretty complex, when
> Drill reads meta about partition, somehow linked
> storage descriptor doesn't have the newly added column. I've created
> DRILL-7365 [1]
> which at the end of description contains one possible workaround.
>
> [1] https://issues.apache.org/jira/browse/DRILL-7365
>
> Thanks,
> Igor
>
>

Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Posted by Igor Guzenko <ih...@gmail.com>.

Hello Jerry,

Thank you very much for detailed steps. The issue is pretty complex, when
Drill reads meta about partition, somehow linked
storage descriptor doesn't have the newly added column. I've created
DRILL-7365 [1]
which at the end of description contains one possible workaround.

[1] https://issues.apache.org/jira/browse/DRILL-7365

Thanks,
Igor

Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Posted by WanHong Fu <wa...@gmail.com>.

Sorry, resend in pure text for cases mentioned before:

My case is as below:
1) create table hive_bucketed2 (emp_id int, first_name string) PARTITIONED
BY (`col_year_month` string) clustered by (emp_id) into 4 buckets stored as
orc tblproperties ('transactional'='true');
2) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values
(1, 'A'),(2, 'B');
3) alter table hive_bucketed2 add columns (age INT);
4) insert into hive_bucketed2 PARTITION (col_year_month = '2019-09') values
(11, '1A', 10),(12, '1B', 22);
5) select * from hive.hive_bucketed2;

WanHong Fu <wa...@gmail.com> 于2019年9月4日周三 上午9:52写道：

> Hi Igor,
>
> Thanks for your reply.
>
> For you case Drill works fine.
> But if the table created with Partition, will hit the issue.
>
> My case is as below:
> 1) *create* *table *hive_bucketed2 (emp_id *int*, first_name *string*)
> PARTITIONED *BY* (`col_year_month` *string*) clustered *by* (emp_id)
> *into* 4 buckets stored *as* orc tblproperties ('transactional'='true');
> 2) *insert* *into* hive_bucketed2 *PARTITION* (col_year_month = '2019-09')
> *values* (1, 'A'),(2, 'B');
> 3) *alter* *table *hive_bucketed2 *add* columns (age *INT*);
> 4) *insert* *into *hive_bucketed2 *PARTITION* (col_year_month = '2019-09')
> *values* (11, '1A', 10),(12, '1B', 22);
> 5) *select* * *from* hive.hive_bucketed2;
>
>
> The expected result for query 5):
> emp_id|first_name|age|col_year_month|
> ------|----------|---|--------------|
>     12|1B        | 22|2019-09       |
>      1|A         |   |2019-09       |
>      2|B         |   |2019-09       |
>     11|1A        | 10|2019-09       |
>
> The actual is:
> emp_id|first_name|age|col_year_month|
> ------|----------|---|--------------|
>     12|1B        |   |2019-09       |
>      1|A         |   |2019-09       |
>      2|B         |   |2019-09       |
>     11|1A        |   |2019-09       |
>
> Appreciated if you can advise us.
>
> Regards
> Jerry
>
> Igor Guzenko <ih...@gmail.com> 于2019年9月3日周二 下午8:56写道：
>
>> Hello Jerry,
>>
>> Could you please describe your use-case in details. I've tried following
>> steps and Drill reads data for new column just fine.
>>
>> 1) Create table
>> create table hive_bucketed(emp_id int, first_name string) clustered by
>> (emp_id) into 4 buckets stored as orc tblproperties
>> ('transactional'='true');
>>
>> 2) insert data
>> insert into hive_bucketed values (1, 'A'),(2, 'B');
>>
>> 3) queried table with Drill and got 2 columns
>> select * from hive.hive_bucketed
>>
>> 4) add new column
>> alter table hive_bucketed add columns (age INT);
>>
>> 5) set data for new column
>> update hive_bucketed SET age=100;
>>
>> 6) queried table with Drill and got 3 columns with data as expected
>>  select * from hive.hive_bucketed
>>
>> Please, provide small use-case to reproduce your issue. Also ddl and logs
>> are welcome:)
>>
>> Thanks,
>> Igor
>>
>>
>>
>>
>> On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) <fu...@paxsz.com> wrote:
>>
>> > Hi,
>> >
>> > We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed
>> ORC
>> > table, everything is okay but when querying the columns added by HiveQL
>> > “alert table … add columns …”, Drill always returns the query result as
>> > NULL for such columns, HiveQL can query with expected results, can
>> anyone
>> > help to give an adivse on how to solve this issue?
>> >
>> > Appreciated & Thanks
>> > Jerry
>>
>

Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Posted by WanHong Fu <wa...@gmail.com>.

Hi Igor,

Thanks for your reply.

For you case Drill works fine.
But if the table created with Partition, will hit the issue.

My case is as below:
1) *create* *table *hive_bucketed2 (emp_id *int*, first_name *string*)
PARTITIONED *BY* (`col_year_month` *string*) clustered *by* (emp_id) *into*
4 buckets stored *as* orc tblproperties ('transactional'='true');
2) *insert* *into* hive_bucketed2 *PARTITION* (col_year_month = '2019-09')
*values* (1, 'A'),(2, 'B');
3) *alter* *table *hive_bucketed2 *add* columns (age *INT*);
4) *insert* *into *hive_bucketed2 *PARTITION* (col_year_month = '2019-09')
*values* (11, '1A', 10),(12, '1B', 22);
5) *select* * *from* hive.hive_bucketed2;


The expected result for query 5):
emp_id|first_name|age|col_year_month|
------|----------|---|--------------|
    12|1B        | 22|2019-09       |
     1|A         |   |2019-09       |
     2|B         |   |2019-09       |
    11|1A        | 10|2019-09       |

The actual is:
emp_id|first_name|age|col_year_month|
------|----------|---|--------------|
    12|1B        |   |2019-09       |
     1|A         |   |2019-09       |
     2|B         |   |2019-09       |
    11|1A        |   |2019-09       |

Appreciated if you can advise us.

Regards
Jerry

Igor Guzenko <ih...@gmail.com> 于2019年9月3日周二 下午8:56写道：

> Hello Jerry,
>
> Could you please describe your use-case in details. I've tried following
> steps and Drill reads data for new column just fine.
>
> 1) Create table
> create table hive_bucketed(emp_id int, first_name string) clustered by
> (emp_id) into 4 buckets stored as orc tblproperties
> ('transactional'='true');
>
> 2) insert data
> insert into hive_bucketed values (1, 'A'),(2, 'B');
>
> 3) queried table with Drill and got 2 columns
> select * from hive.hive_bucketed
>
> 4) add new column
> alter table hive_bucketed add columns (age INT);
>
> 5) set data for new column
> update hive_bucketed SET age=100;
>
> 6) queried table with Drill and got 3 columns with data as expected
>  select * from hive.hive_bucketed
>
> Please, provide small use-case to reproduce your issue. Also ddl and logs
> are welcome:)
>
> Thanks,
> Igor
>
>
>
>
> On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) <fu...@paxsz.com> wrote:
>
> > Hi,
> >
> > We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed
> ORC
> > table, everything is okay but when querying the columns added by HiveQL
> > “alert table … add columns …”, Drill always returns the query result as
> > NULL for such columns, HiveQL can query with expected results, can anyone
> > help to give an adivse on how to solve this issue?
> >
> > Appreciated & Thanks
> > Jerry
>

Re: Drill query result with NULL for new added column in transactional (ACID) Hive bucketed ORC table

Posted by Igor Guzenko <ih...@gmail.com>.

Hello Jerry,

Could you please describe your use-case in details. I've tried following
steps and Drill reads data for new column just fine.

1) Create table
create table hive_bucketed(emp_id int, first_name string) clustered by
(emp_id) into 4 buckets stored as orc tblproperties
('transactional'='true');

2) insert data
insert into hive_bucketed values (1, 'A'),(2, 'B');

3) queried table with Drill and got 2 columns
select * from hive.hive_bucketed

4) add new column
alter table hive_bucketed add columns (age INT);

5) set data for new column
update hive_bucketed SET age=100;

6) queried table with Drill and got 3 columns with data as expected
 select * from hive.hive_bucketed

Please, provide small use-case to reproduce your issue. Also ddl and logs
are welcome:)

Thanks,
Igor




On Tue, Sep 3, 2019 at 11:03 AM FuWH(傅万红/苏州) <fu...@paxsz.com> wrote:

> Hi,
>
> We are using Drill v1.16.0 to query transactional (ACID) Hive bucketed ORC
> table, everything is okay but when querying the columns added by HiveQL
> “alert table … add columns …”, Drill always returns the query result as
> NULL for such columns, HiveQL can query with expected results, can anyone
> help to give an adivse on how to solve this issue?
>
> Appreciated & Thanks
> Jerry