You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "Kent Yao (Jira)" <ji...@apache.org> on 2021/01/19 07:10:00 UTC

[jira] [Resolved] (SUBMARINE-638) Spark-security ranger plugin - Limit to be applied after masking projection

     [ https://issues.apache.org/jira/browse/SUBMARINE-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kent Yao resolved SUBMARINE-638.
--------------------------------
    Fix Version/s: 0.6.0
       Resolution: Fixed

Issue resolved by pull request 490
[https://github.com/apache/submarine/pull/490]

> Spark-security ranger plugin - Limit to be applied after masking projection
> ---------------------------------------------------------------------------
>
>                 Key: SUBMARINE-638
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-638
>             Project: Apache Submarine
>          Issue Type: Improvement
>          Components: Security
>            Reporter: Tenneti Venkata Sri Harsha
>            Priority: Major
>              Labels: pull-request-available, security
>             Fix For: 0.6.0
>
>
> Let's say there is a query with a limit like below and value has to be masked
> {code:java}
> SELECT key, value from default.src limit 10{code}
> Then the plan looks like below
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
>    +- 'Project ['key, 'value]
>       +- 'UnresolvedRelation `default`.`src`Project 
> == Optimized Logical Plan ==
> [key#36,HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
> +- GlobalLimit 10
>    +- LocalLimit 10
>       +- SubmarineDataMasking
>          +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
> +- *(2) GlobalLimit 10
>    +- Exchange SinglePartition
>       +- *(1) LocalLimit 10
>          +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, value#37]
> {code}
> The above plan will read all the files in the table. This is because the optimised logical plan has a project over the limit. If the optimised logical plan has a limit after masking projection the physical plan will convert to have collectLimit and hence the collect will read only one file.
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
>    +- 'Project ['key, 'value]
>       +- 'UnresolvedRelation `default`.`src`
> == Optimized Logical Plan ==
> GlobalLimit 10
> +- LocalLimit 10
>    +- Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
>       +- SubmarineDataMasking
>          +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> CollectLimit 10
>    +- Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
>       +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, value#37]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org