You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/05/05 05:41:00 UTC

[jira] [Commented] (IMPALA-10493) Using JOIN ON syntax to join two full ACID collections produces wrong results

    [ https://issues.apache.org/jira/browse/IMPALA-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339428#comment-17339428 ] 

ASF subversion and git services commented on IMPALA-10493:
----------------------------------------------------------

Commit f0f083e45e2c77b1499fa6fa08ff8d9dc4a2785f in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f0f083e ]

IMPALA-10482, IMPALA-10493: Fix bugs in full ACID collection query rewrites

IMPALA-10482: SELECT * query on unrelative collection column of
transactional ORC table will hit IllegalStateException.

The AcidRewriter will rewrite queries like
"select item from my_complex_orc.int_array" to
"select item from my_complex_orc t, t.int_array"

This cause troubles in star expansion. Because the original query
"select * from my_complex_orc.int_array" is analyzed as
"select item from my_complex_orc.int_array"

But the rewritten query "select * from my_complex_orc t, t.int_array" is
analyzed as "select id, item from my_complex_orc t, t.int_array".

Hidden table refs can also cause issues during regular column
resolution. E.g. when the table has top-level 'pos'/'item'/'key'/'value'
columns.

The workaround is to keep track of the automatically added table refs
during query rewrite. So when we analyze the rewritten query we can
ignore these auxiliary table refs.

IMPALA-10493: Using JOIN ON syntax to join two full ACID collections
produces wrong results.

When AcidRewriter.splitCollectionRef() creates a new collection ref
it doesn't copy every information needed to correctly execute the
query. E.g. it dropped the ON clause, turning INNER joins to CROSS
joins.

Testing:
 * added e2e tests

Change-Id: I8fc758d3c1e75c7066936d590aec8bff8d2b00b0
Reviewed-on: http://gerrit.cloudera.org:8080/17038
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Using JOIN ON syntax to join two full ACID collections produces wrong results
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-10493
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10493
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: CorrectnessBug, impala-acid
>
> The following query produces wrong results:
> {noformat}
> use functional_orc_def; // use full ACID tables
> select a1.item, a2.item
> from complextypestbl.int_array a1 join complextypestbl.int_array a2
> on a1.item=a2.item
> where a1.item<2;{noformat}
> It creates a CROSS JOIN without the predicate "a1.item = a2.item", generating too many rows. The expected plan node would be an INNER JOIN on "a1.item = a2.item".
> If we put the JOIN condition to the WHERE clause we get the correct plan:
> {noformat}
> select a1.item, a2.item
> from complextypestbl.int_array a1 join complextypestbl.int_array a2
> where a1.item=a2.item and a1.item<2{noformat}
> We also get a correct plan if the right table is non-ACID:
> {noformat}
> select a1.item, a2.item
> from complextypestbl.int_array a1 join functional_parquet.complextypestbl.int_array a2
> on a1.item=a2.item
> where a1.item<2;{noformat}
> Or ACID table but the column is non-collection:
> {noformat}
> select c.id, a1.item
> from complextypestbl.int_array a1 join complextypestbl c
> on c.id=a1.item
> where c.id<2;{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org