You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/03/27 23:02:00 UTC

[jira] [Commented] (IMPALA-11744) Table mask view should preserve the original column order in Hive

    [ https://issues.apache.org/jira/browse/IMPALA-11744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705713#comment-17705713 ] 

ASF subversion and git services commented on IMPALA-11744:
----------------------------------------------------------

Commit 9bf8607ce58a1a2573c8c2b0ebdf9179a1840429 in impala's branch refs/heads/branch-4.1.2 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9bf8607ce ]

IMPALA-11744: Table mask view should preserve the original column order in Hive

Ranger provides column masking and row filtering policies to mask
sensitive data for specific users/groups. When a table should be masked
in a query, Impala replaces it with a table mask view that exposes the
columns with masked expressions.

After IMPALA-9661, only selected columns are exposed in the table mask
view. However, the columns of the view are exposed in the order that
they are registered. If the registering order differs from the column
order in the table, STAR expansions will mismatch the columns.

To be specific, let's say table 'tbl' with 3 columns a, b, c should be
masked in the following query:
  select b, * from tbl;
Ideally Impala should replace the TableRef of 'tbl' with a table mask
view as:
  select b, * from (
    select mask(a) a, mask(b) b, mask(c) c from tbl
  ) t;

Currently, the rewritten query is
  select b, * from (
    select mask(b) b, mask(a) a, mask(c) c from tbl
  ) t;
This incorrectly expands the STAR as "b, a, c" in the re-analyze phase.

The cause is that column 'b' is registered earlier than all other
columns. This patch fixes it by sorting the selected columns based on
their original order in the table.

Tests:
 - Add tests for selecting STAR with normal columns on table and view.

Backport Note for 4.1.2:
Keep the import of Optional in Analyzer.java.
Removed some tests due to virtual column input__file__name not supported.

Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
Reviewed-on: http://gerrit.cloudera.org:8080/19279
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Table mask view should preserve the original column order in Hive
> -----------------------------------------------------------------
>
>                 Key: IMPALA-11744
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11744
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Security
>    Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Blocker
>             Fix For: Impala 4.3.0
>
>
> Ranger provides column masking and row filtering policies to mask sensitive data to specified users/groups. When a table should be masked in a query, Impala replaces it with a table mask view that expose the columns with masked expressions.
> After IMPALA-9661, only selected columns are exposed in the table mask view. However, the columns are exposed in the order that they are registered, which can provide wrong results if the original statement contains STAR expressions.
> The following example shows the issue:
> {code:sql}
> create table mask_test_tbl (a string, b string, c string, d string);
> insert into mask_test_tbl values ("aaaa", "bbbb", "cccc", "dddd");
> -- Create a column masking policies on column c using Redact
> select * from mask_test_tbl;
> +------+------+------+------+
> | a    | b    | c    | d    |
> +------+------+------+------+
> | aaaa | bbbb | xxxx | dddd |
> +------+------+------+------+
> {code}
> The following query produces incorrect results:
> {code:sql}
> select b, * from mask_test_tbl;
> +------+------+------+------+------+
> | b    | a    | b    | c    | d    |
> +------+------+------+------+------+
> | bbbb | bbbb | aaaa | xxxx | dddd |
> +------+------+------+------+------+
> {code}
> Note that the results of 2nd and 3rd columns are reverted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org