You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "Jackhjf (via GitHub)" <gi...@apache.org> on 2023/02/16 08:28:10 UTC

[GitHub] [kyuubi] Jackhjf opened a new issue, #4341: [Bug] When using kyuubi to do data masking, an error is reported

Jackhjf opened a new issue, #4341:
URL: https://github.com/apache/kyuubi/issues/4341

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Describe the bug
   
   Configure the mask policy on the ranger, and when executing sql through Spark, an error is reported
   
   
   <img width="1532" alt="image" src="https://user-images.githubusercontent.com/33471639/219309415-498c9451-cde7-4730-b6dd-41ecd97ff0c2.png">
   
   spark-sql> select name from iceberg_spark.test_copy;
   
   
   ### Affects Version(s)
   
   master
   
   ### Kyuubi Server Log Output
   
   ```logtalk
   Error in query: Resolved attribute(s) name#6 missing from id#5,name#7 in operator !Project [name#6]. Attribute(s) with the same name appear in the operation: name. Please check if the right attribute(s) are used.;
   !Project [name#6]
   +- SubqueryAlias spark_catalog.iceberg_spark.test_copy
      +- Project [id#5, md5(cast(name#6 as binary)) AS name#7]
         +- RowFilterAndDataMaskingMarker
            +- RelationV2[id#5, name#6] spark_catalog.iceberg_spark.test_copy
   ```
   
   
   ### Kyuubi Engine Log Output
   
   _No response_
   
   ### Kyuubi Server Configurations
   
   _No response_
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
   ```
   
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] bowenliang123 commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1445798280

   Fixed in https://github.com/apache/kyuubi/pull/4358. Please have a check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] bowenliang123 commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1460338542

   > > thanks @Jackhjf for the validation
   > > now i found a problem,if I excute this sql:
   > > select id ,name from iceberg_spark.test_copy  a where a.name like "中%";
   > > I will get an empty result
   > 
   > <img alt="image" width="1172" src="https://user-images.githubusercontent.com/33471639/223753849-a01805f9-c1c9-4a51-9046-d2a39258a183.png">
   > 
   > <img alt="image" width="1039" src="https://user-images.githubusercontent.com/33471639/223753943-408a4612-6bb8-4f3c-8a50-c26f3bf87ac2.png">
   > 
   > @yaooqinn
   
   I think this is the exact result that you would expect with the column masking. The masking is properly pushed down to leaf node of scanning the source.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] Jackhjf commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "Jackhjf (via GitHub)" <gi...@apache.org>.
Jackhjf commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1455697791

   > Fixed in #4358. Please have a check. @Jackhjf
   
   Thanks for your work, I can get the correct result with the following use case; @bowenliang123 @yaooqinn 
   
   
   create table yhbi.test_copy(id string,name string) using iceberg ;
   create table yhbi.test_copy1(id1 string,name1 string) using iceberg ;
   create table yhbi.test1(id string,name string) using iceberg(**name** is the data masking field;
   create table iceberg_spark.test_copy(id string,name string) using iceberg (**name** is the data masking field);
   
    001 
   
   insert into yhbi.test_copy select *from  iceberg_spark.test_copy
   
   002 
   
   
   insert into yhbi.test_copy1 select *from  iceberg_spark.test_copy
   
   003 
   
   insert overwrite yhbi.test_copy1 select id,name as name1 from  iceberg_spark.test_copy 
   
   004 
   
   insert overwrite yhbi.test_copy1 select id ,name as name1 from iceberg_spark.test_copy  a where a.id in ("111111","11111");
   
   
   005
   
   
   insert overwrite yhbi.test_copy1 select id ,name as name1 from iceberg_spark.test_copy order by id, name;
   
   006 
   
   select max(name) from iceberg_spark.test_copy;
   select min(name) from iceberg_spark.test_copy;
   
   007 
   
   insert overwrite yhbi.test_copy1 select id ,count(name) from iceberg_spark.test_copy  group by id;
   insert overwrite yhbi.test_copy1 select count(id) ,name from iceberg_spark.test_copy  group by name;
   
   
   008 
   
   insert overwrite yhbi.test_copy1 select b.id,b.name from iceberg_spark.test_copy a join  iceberg_spark.test1 b on a.id=b.id;
   
   009 
   insert overwrite yhbi.test_copy1  select id,name as name1 from iceberg_spark.test_copy union select *from  (select a.id ,a.name from iceberg_spark.test1 a  join iceberg_spark.test_copy b on a.id=b.id) c;
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] Jackhjf commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "Jackhjf (via GitHub)" <gi...@apache.org>.
Jackhjf commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1461174487

   > You have a mask rule on the column - name -, which will be masked during scan operation, so the operations that follow will only get the masked data.
   > 
   > I know it's a bit counterintuitive. But it is correct. Supposing we don't mask ahead and change your test case to `select id ,name from iceberg_spark.test_copy a where a.name ='Jackhjf'`. Users get the result `md5('Jackhjf')`, but everyone knows the datasource contains records for `Jackhjf`. In such a case, the data is not secured, right?
   
   From this point of view, the data is really not safe.thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] bowenliang123 closed issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 closed issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
URL: https://github.com/apache/kyuubi/issues/4341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] yaooqinn commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1461166559

   You have a mask rule on the column - name -, which will be masked during scan operation, so the operations that follow will only get the masked data.
   
   I know it's a bit counterintuitive. But it is correct. Supposing we don't mask ahead and change your test case to `select id ,name from iceberg_spark.test_copy a where a.name ='Jackhjf'`. Users get the result `md5('Jackhjf')`, but everyone knows the datasource contains records for `Jackhjf`. In such a case, the data is not secured, right?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] bowenliang123 commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1432703349

   Known issue applying column masking on DataSourceV2Relation as in https://github.com/apache/kyuubi/issues/4202. 
   Suggested ut and fix in https://github.com/apache/kyuubi/pull/4304, but it's in pending status. More work is required to do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] Jackhjf commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "Jackhjf (via GitHub)" <gi...@apache.org>.
Jackhjf commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1460325848

   > thanks @Jackhjf for the validation
   now i found a problem,if I excute this sql:
   select id ,name as name1 from iceberg_spark.test_copy  a where a.name like "中%"; 
   I will get an empty result
   
   <img width="1172" alt="image" src="https://user-images.githubusercontent.com/33471639/223753849-a01805f9-c1c9-4a51-9046-d2a39258a183.png">
   
   <img width="1039" alt="image" src="https://user-images.githubusercontent.com/33471639/223753943-408a4612-6bb8-4f3c-8a50-c26f3bf87ac2.png">
   
   
   
   @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] github-actions[bot] commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1432700721

   Hello @Jackhjf,
   Thanks for finding the time to report the issue!
   We really appreciate the community's efforts to improve Apache Kyuubi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] yaooqinn commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1455782490

   thanks @Jackhjf for the validation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org