You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "Kent Yao (Jira)" <ji...@apache.org> on 2021/01/19 07:10:00 UTC
[jira] [Resolved] (SUBMARINE-638) Spark-security ranger plugin -
Limit to be applied after masking projection
[ https://issues.apache.org/jira/browse/SUBMARINE-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kent Yao resolved SUBMARINE-638.
--------------------------------
Fix Version/s: 0.6.0
Resolution: Fixed
Issue resolved by pull request 490
[https://github.com/apache/submarine/pull/490]
> Spark-security ranger plugin - Limit to be applied after masking projection
> ---------------------------------------------------------------------------
>
> Key: SUBMARINE-638
> URL: https://issues.apache.org/jira/browse/SUBMARINE-638
> Project: Apache Submarine
> Issue Type: Improvement
> Components: Security
> Reporter: Tenneti Venkata Sri Harsha
> Priority: Major
> Labels: pull-request-available, security
> Fix For: 0.6.0
>
>
> Let's say there is a query with a limit like below and value has to be masked
> {code:java}
> SELECT key, value from default.src limit 10{code}
> Then the plan looks like below
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
> +- 'Project ['key, 'value]
> +- 'UnresolvedRelation `default`.`src`Project
> == Optimized Logical Plan ==
> [key#36,HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
> +- GlobalLimit 10
> +- LocalLimit 10
> +- SubmarineDataMasking
> +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
> +- *(2) GlobalLimit 10
> +- Exchange SinglePartition
> +- *(1) LocalLimit 10
> +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, value#37]
> {code}
> The above plan will read all the files in the table. This is because the optimised logical plan has a project over the limit. If the optimised logical plan has a limit after masking projection the physical plan will convert to have collectLimit and hence the collect will read only one file.
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
> +- 'Project ['key, 'value]
> +- 'UnresolvedRelation `default`.`src`
> == Optimized Logical Plan ==
> GlobalLimit 10
> +- LocalLimit 10
> +- Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
> +- SubmarineDataMasking
> +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> CollectLimit 10
> +- Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
> +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, value#37]{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org