You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "Jackhjf (via GitHub)" <gi...@apache.org> on 2023/02/16 08:28:10 UTC
[GitHub] [kyuubi] Jackhjf opened a new issue, #4341: [Bug] When using kyuubi to do data masking, an error is reported
Jackhjf opened a new issue, #4341:
URL: https://github.com/apache/kyuubi/issues/4341
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
### Search before asking
- [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues.
### Describe the bug
Configure the mask policy on the ranger, and when executing sql through Spark, an error is reported
<img width="1532" alt="image" src="https://user-images.githubusercontent.com/33471639/219309415-498c9451-cde7-4730-b6dd-41ecd97ff0c2.png">
spark-sql> select name from iceberg_spark.test_copy;
### Affects Version(s)
master
### Kyuubi Server Log Output
```logtalk
Error in query: Resolved attribute(s) name#6 missing from id#5,name#7 in operator !Project [name#6]. Attribute(s) with the same name appear in the operation: name. Please check if the right attribute(s) are used.;
!Project [name#6]
+- SubqueryAlias spark_catalog.iceberg_spark.test_copy
+- Project [id#5, md5(cast(name#6 as binary)) AS name#7]
+- RowFilterAndDataMaskingMarker
+- RelationV2[id#5, name#6] spark_catalog.iceberg_spark.test_copy
```
### Kyuubi Engine Log Output
_No response_
### Kyuubi Server Configurations
_No response_
### Kyuubi Engine Configurations
```yaml
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
```
### Additional context
_No response_
### Are you willing to submit PR?
- [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- [ ] No. I cannot submit a PR at this time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] bowenliang123 commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1445798280
Fixed in https://github.com/apache/kyuubi/pull/4358. Please have a check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] bowenliang123 commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1460338542
> > thanks @Jackhjf for the validation
> > now i found a problem,if I excute this sql:
> > select id ,name from iceberg_spark.test_copy a where a.name like "中%";
> > I will get an empty result
>
> <img alt="image" width="1172" src="https://user-images.githubusercontent.com/33471639/223753849-a01805f9-c1c9-4a51-9046-d2a39258a183.png">
>
> <img alt="image" width="1039" src="https://user-images.githubusercontent.com/33471639/223753943-408a4612-6bb8-4f3c-8a50-c26f3bf87ac2.png">
>
> @yaooqinn
I think this is the exact result that you would expect with the column masking. The masking is properly pushed down to leaf node of scanning the source.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] Jackhjf commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "Jackhjf (via GitHub)" <gi...@apache.org>.
Jackhjf commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1455697791
> Fixed in #4358. Please have a check. @Jackhjf
Thanks for your work, I can get the correct result with the following use case; @bowenliang123 @yaooqinn
create table yhbi.test_copy(id string,name string) using iceberg ;
create table yhbi.test_copy1(id1 string,name1 string) using iceberg ;
create table yhbi.test1(id string,name string) using iceberg(**name** is the data masking field;
create table iceberg_spark.test_copy(id string,name string) using iceberg (**name** is the data masking field);
001
insert into yhbi.test_copy select *from iceberg_spark.test_copy
002
insert into yhbi.test_copy1 select *from iceberg_spark.test_copy
003
insert overwrite yhbi.test_copy1 select id,name as name1 from iceberg_spark.test_copy
004
insert overwrite yhbi.test_copy1 select id ,name as name1 from iceberg_spark.test_copy a where a.id in ("111111","11111");
005
insert overwrite yhbi.test_copy1 select id ,name as name1 from iceberg_spark.test_copy order by id, name;
006
select max(name) from iceberg_spark.test_copy;
select min(name) from iceberg_spark.test_copy;
007
insert overwrite yhbi.test_copy1 select id ,count(name) from iceberg_spark.test_copy group by id;
insert overwrite yhbi.test_copy1 select count(id) ,name from iceberg_spark.test_copy group by name;
008
insert overwrite yhbi.test_copy1 select b.id,b.name from iceberg_spark.test_copy a join iceberg_spark.test1 b on a.id=b.id;
009
insert overwrite yhbi.test_copy1 select id,name as name1 from iceberg_spark.test_copy union select *from (select a.id ,a.name from iceberg_spark.test1 a join iceberg_spark.test_copy b on a.id=b.id) c;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] Jackhjf commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "Jackhjf (via GitHub)" <gi...@apache.org>.
Jackhjf commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1461174487
> You have a mask rule on the column - name -, which will be masked during scan operation, so the operations that follow will only get the masked data.
>
> I know it's a bit counterintuitive. But it is correct. Supposing we don't mask ahead and change your test case to `select id ,name from iceberg_spark.test_copy a where a.name ='Jackhjf'`. Users get the result `md5('Jackhjf')`, but everyone knows the datasource contains records for `Jackhjf`. In such a case, the data is not secured, right?
From this point of view, the data is really not safe.thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] bowenliang123 closed issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 closed issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
URL: https://github.com/apache/kyuubi/issues/4341
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] yaooqinn commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1461166559
You have a mask rule on the column - name -, which will be masked during scan operation, so the operations that follow will only get the masked data.
I know it's a bit counterintuitive. But it is correct. Supposing we don't mask ahead and change your test case to `select id ,name from iceberg_spark.test_copy a where a.name ='Jackhjf'`. Users get the result `md5('Jackhjf')`, but everyone knows the datasource contains records for `Jackhjf`. In such a case, the data is not secured, right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] bowenliang123 commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "bowenliang123 (via GitHub)" <gi...@apache.org>.
bowenliang123 commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1432703349
Known issue applying column masking on DataSourceV2Relation as in https://github.com/apache/kyuubi/issues/4202.
Suggested ut and fix in https://github.com/apache/kyuubi/pull/4304, but it's in pending status. More work is required to do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] Jackhjf commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "Jackhjf (via GitHub)" <gi...@apache.org>.
Jackhjf commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1460325848
> thanks @Jackhjf for the validation
now i found a problem,if I excute this sql:
select id ,name as name1 from iceberg_spark.test_copy a where a.name like "中%";
I will get an empty result
<img width="1172" alt="image" src="https://user-images.githubusercontent.com/33471639/223753849-a01805f9-c1c9-4a51-9046-d2a39258a183.png">
<img width="1039" alt="image" src="https://user-images.githubusercontent.com/33471639/223753943-408a4612-6bb8-4f3c-8a50-c26f3bf87ac2.png">
@yaooqinn
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] github-actions[bot] commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1432700721
Hello @Jackhjf,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] yaooqinn commented on issue #4341: [Bug] When using kyuubi to do data masking, an error is reported
Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on issue #4341:
URL: https://github.com/apache/kyuubi/issues/4341#issuecomment-1455782490
thanks @Jackhjf for the validation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org