You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/04/16 04:44:19 UTC

[GitHub] [incubator-kyuubi] packyan opened a new issue, #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark 2.4

packyan opened a new issue, #2390:
URL: https://github.com/apache/incubator-kyuubi/issues/2390

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/incubator-kyuubi/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Describe the bug
   
   First of all, thanks to Apache-Kyuubi for providing the spark sql authz module. 
   
   I am using spark 2.4.x and compiled the Kyuubi  spark sql authz module separately and run it on the spark-sql cli as a spark sql extension, but found that the logical plan generated by the data masking rules cannot be executed on spark2.4.
   
   For example,  I have a hive table `create table tab1 (id int, name string)`, and in the ranger, the col `name` is applied the hash masking rule. When I execute the sql `select name from tab1`, it throw a Error:
   
   ```shell
   Error in query: Resolved attribute(s) name#1 missing from id#0,name#8 in operator !Project [id#0, md5(cast(cast(name#1 as string) as binary)) AS name#2]. Attribute(s) with the same name appear in the operation: name. Please check if the right attribute(s) are used.;;
   Project [name#2]
   +- SubqueryAlias `default`.`tab1`
      +- !Project [id#0, md5(cast(cast(name#1 as string) as binary)) AS name#2]
         +- Project [id#0, md5(cast(cast(name#1 as string) as binary)) AS name#8]
            +- HiveTableRelation `default`.`tab1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#0, name#1]
   ```
   This looks like the logical plan generated by the data masking rule cannot pass the checkAnalysis(analyzed) method of spark 2.4.
   
   
   
   ### Affects Version(s)
   
   master
   
   ### Kyuubi Server Log Output
   
   _No response_
   
   ### Kyuubi Engine Log Output
   
   _No response_
   
   ### Kyuubi Server Configurations
   
   _No response_
   
   ### Kyuubi Engine Configurations
   
   _No response_
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yaooqinn commented on issue #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark 2.4

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on issue #2390:
URL: https://github.com/apache/incubator-kyuubi/issues/2390#issuecomment-1100574310

   thanks can you also help test other higher spark versions? to see whether we can narrow scenario where the bug lurks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] github-actions[bot] commented on issue #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark 2.4

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #2390:
URL: https://github.com/apache/incubator-kyuubi/issues/2390#issuecomment-1100568191

   Hello @packyan,
   Thanks for finding the time to report the issue!
   We really appreciate the community's efforts to improve Apache Kyuubi (Incubating).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] yaooqinn commented on issue #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark SQL CLI

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on issue #2390:
URL: https://github.com/apache/incubator-kyuubi/issues/2390#issuecomment-1100634979

   thanks for the verification and use case 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] ulysses-you closed issue #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark SQL CLI

Posted by GitBox <gi...@apache.org>.
ulysses-you closed issue #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark SQL CLI
URL: https://github.com/apache/incubator-kyuubi/issues/2390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [incubator-kyuubi] packyan commented on issue #2390: [Bug] Kyuubi Spark AuthZ Module DataMasking Feature Does Not Support Spark SQL CLI

Posted by GitBox <gi...@apache.org>.
packyan commented on issue #2390:
URL: https://github.com/apache/incubator-kyuubi/issues/2390#issuecomment-1100618321

   > thanks can you also help test other higher spark versions? to see whether we can narrow scenario where the bug lurks
   
   Hi there, I edit the unit test in RangerSparkExtensionSuite test("data masking") , after these insert command, I add some test code:
   ```scala
      doAs(
           "bob", {
             sql(s"select * from $db.$table").collect().foreach(println)
             sql(s"select * from $db.$table").show()
           })
   ```
   The first command can run without any problem.
   But the second one,ALWAYS THROW EXCEPTION IN EVERY VERSION OF SPARK (2.4 - 3.2).
   
   ```shell
   Caused by: org.apache.spark.sql.AnalysisException: Resolved attribute(s) value5#11,value4#10,value2#8,value3#9,value1#7 missing from value2#109,key#6,value3#110,value1#108,value5#112,value4#111 in operator !Project [key#6, md5(cast(cast(value1#7 as string) as binary)) AS value1#86, regexp_replace(regexp_replace(regexp_replace(value2#8, [A-Z], X), [a-z], x), [0-9], n) AS value2#87, concat(substring(value3#9, 0, 4), regexp_replace(regexp_replace(regexp_replace(substring(value3#9, 5, 2147483647), [A-Z], X), [a-z], x), [0-9], n)) AS value3#88, date_trunc(YEAR, value4#10, Some(Asia/Shanghai)) AS value4#89, concat(regexp_replace(regexp_replace(regexp_replace(left(value5#11, (length(value5#11) - 4)), [A-Z], X), [a-z], x), [0-9], n), right(value5#11, 4)) AS value5#90]. Attribute(s) with the same name appear in the operation: value5,value4,value2,value3,value1. Please check if the right attribute(s) are used.;;
   Project [cast(key#6 as string) AS key#103, cast(value1#86 as string) AS value1#104, cast(value2#87 as string) AS value2#105, cast(value3#88 as string) AS value3#106, cast(value4#89 as string) AS value4#113, cast(value5#90 as string) AS value5#107]
   +- Project [key#6, value1#86, value2#87, value3#88, value4#89, value5#90]
      +- SubqueryAlias `default`.`src`
         +- !Project [key#6, md5(cast(cast(value1#7 as string) as binary)) AS value1#86, regexp_replace(regexp_replace(regexp_replace(value2#8, [A-Z], X), [a-z], x), [0-9], n) AS value2#87, concat(substring(value3#9, 0, 4), regexp_replace(regexp_replace(regexp_replace(substring(value3#9, 5, 2147483647), [A-Z], X), [a-z], x), [0-9], n)) AS value3#88, date_trunc(YEAR, value4#10, Some(Asia/Shanghai)) AS value4#89, concat(regexp_replace(regexp_replace(regexp_replace(left(value5#11, (length(value5#11) - 4)), [A-Z], X), [a-z], x), [0-9], n), right(value5#11, 4)) AS value5#90]
            +- Filter (key#6 < 20)
               +- Project [key#6, md5(cast(cast(value1#7 as string) as binary)) AS value1#108, regexp_replace(regexp_replace(regexp_replace(value2#8, [A-Z], X), [a-z], x), [0-9], n) AS value2#109, concat(substring(value3#9, 0, 4), regexp_replace(regexp_replace(regexp_replace(substring(value3#9, 5, 2147483647), [A-Z], X), [a-z], x), [0-9], n)) AS value3#110, date_trunc(YEAR, value4#10, Some(Asia/Shanghai)) AS value4#111, concat(regexp_replace(regexp_replace(regexp_replace(left(value5#11, (length(value5#11) - 4)), [A-Z], X), [a-z], x), [0-9], n), right(value5#11, 4)) AS value5#112]
                  +- Filter (key#6 < 20)
                     +- Relation[key#6,value1#7,value2#8,value3#9,value4#10,value5#11] parquet
   
   	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:43)
   	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:42)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org