You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "Tenneti Venkata Sri Harsha (Jira)" <ji...@apache.org> on 2020/09/27 08:44:00 UTC

[jira] [Created] (SUBMARINE-638) Spark-security ranger plugin - Limit to be applied after masking projection

Tenneti Venkata Sri Harsha created SUBMARINE-638:
----------------------------------------------------

             Summary: Spark-security ranger plugin - Limit to be applied after masking projection
                 Key: SUBMARINE-638
                 URL: https://issues.apache.org/jira/browse/SUBMARINE-638
             Project: Apache Submarine
          Issue Type: Improvement
          Components: Security
            Reporter: Tenneti Venkata Sri Harsha


Let's say there is a query with a limit like below and value has to be masked
{code:java}
SELECT key, value from default.src limit 10{code}
Then the plan looks like below
{code:java}
== Parsed Logical Plan ==
'GlobalLimit 10
+- 'LocalLimit 10
   +- 'Project ['key, 'value]
      +- 'UnresolvedRelation `default`.`src`Project 

== Optimized Logical Plan ==
[key#36,HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
+- GlobalLimit 10
   +- LocalLimit 10
      +- SubmarineDataMasking
         +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]

== Physical Plan ==
Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
+- *(2) GlobalLimit 10
   +- Exchange SinglePartition
      +- *(1) LocalLimit 10
         +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, value#37]
{code}
The above plan will read all the files in the table. This is because the optimised logical plan has a project over the limit. If the optimised logical plan has a limit after masking projection the physical plan will convert to have collectLimit and hence the collect will read only one file.
{code:java}
== Parsed Logical Plan ==
'GlobalLimit 10
+- 'LocalLimit 10
   +- 'Project ['key, 'value]
      +- 'UnresolvedRelation `default`.`src`

== Optimized Logical Plan ==
GlobalLimit 10
+- LocalLimit 10
   +- Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
      +- SubmarineDataMasking
         +- HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]

== Physical Plan ==
CollectLimit 10
   +- Project [key#36, HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1) AS value#41]
      +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, value#37]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org