You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/04/22 14:09:52 UTC

[Impala-ASF-CR] IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites

Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17038 )

Change subject: IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17038/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17038/3//COMMIT_MSG@34
PS3, Line 34: When AcidRewriter.splitCollectionRef() creates a new collection ref
            : it doesn't copy every information needed to correctly execute the
            : query. E.g. it dropped the ON clause, turning INNER joins to CROSS
            : joins.
> I think copying the join stuffs is not enough. We are rewritting "A join B"
Thank you for pointing this out. I think an inlineview works well in the above case, e.g. rewriting the query to:

with v as (
  select ('k4') as key
  union all
  values ('k1'), ('k2'), ('k3')
)
select * from v
left join
(select * from
 functional_orc_def.complextypestbl /*hiddenTableRef*/$a$1, $a$1.int_map) vv
using (key)

It produces correct results for me.

However, it might not work well if the table has multiple levels of nesting, e.g.:

 Hive shell>
 CREATE TABLE nested_vals (
   val int,
   arr ARRAY<STRUCT<
     val:INT,
     arr:ARRAY<STRUCT<
       val:INT,
       arr:ARRAY<STRUCT<
         val:INT>>>>>>)
 STORED AS ORC
 TBLPROPERTIES ('transactional'='true');

 INSERT INTO nested_vals select 1,
                                array(named_struct("val", 1, "arr",
                                  array(named_struct("val", 1, "arr",
                                    array(named_struct("val", 1))))));

Let's consider the following SELECT:

 SELECT a.val, aa.val, aaa.val
 FROM nested_vals.arr a, a.arr aa, aa.arr aaa;

Currently we rewrite it to:

 SELECT a.val, aa.val, aaa.val
 FROM nested_vals $a$1, $a$1.arr a, a.arr aa, aa.arr aaa;

The above query returns correct results, but not work well in the context of outer joins as you pointed out.

Using an inline view is not trivial in this case. Creating an inline view from the first table ref doesn't work:

 SELECT a.val, aa.val, aaa.val
 FROM (select arr.* from nested_vals $a$1, $a$1.arr) va, va.arr aa, aa.arr aaa;

In that case we get an AnalysisException because va.arr is not selected. In the inline view we cannot add 'arr.arr' to the select list because it is a complex type. And even if could select 'arr.arr' in the inline view, I'm not sure what this query would do.

Putting everything to an inline view is better but it also doesn't work out of the box:

 select a.val, aa.val, aaa.val from (select a.*, aa.*, aaa.* from nested_vals $a$1, $a$1.arr a, a.arr aa, aa.arr aaa) v;

We get AnalysisException: duplicated inline view column alias: 'val' in inline view 'v'

For this to work we'd need to rename the slot refs which might be complicated/risky in itself:

 select a_val, aa_val, aaa_val from (select a.val a_val, aa.val aa_val, aaa.val aaa_val from nested_vals $a$1, $a$1.arr a, a.arr aa, aa.arr aaa) v;

This returns correct results, but I'm not sure how this would work in the context of different JOINs at different levels.

I think where this approach fails is when we have nested structs at different levels. E.g. in the select query we might refer to nested fields:

 SELECT x.nested_struct.a.b ...

But the inline view cannot select structs unfortunately. Luckily IMPALA-9494 is in progress, so probably this could work in the second half of the year.

I'm curious about your opinion on the above as you have more experience with these kind of rewrites.

Anyway, this patch already solves a few issues while not introducing new ones, so how about pushing this to master and opening a new Jira to deal with the above problem as that seems to be the toughest one?



-- 
To view, visit http://gerrit.cloudera.org:8080/17038
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0
Gerrit-Change-Number: 17038
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Apr 2021 14:09:52 +0000
Gerrit-HasComments: Yes