You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Andrés de la Peña <ad...@apache.org> on 2022/09/07 11:49:52 UTC

Re: [DISCUSS] CEP-20: Dynamic Data Masking

If nobody has more concerns regarding the CEP I will start the vote
tomorrow.

On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
wrote:

> Is there enough support here for VIEWS to be the implementation strategy
>> for displaying masking functions?
>
>
> I'm not sure that views should be "the" strategy for masking functions. We
> have multiple approaches here:
>
> 1) CQL functions only. Users can decide to use the masking functions on
> their own will. I think most dbs allow this pattern of usage, which is
> quite straightforward. Obviously, it doesn't allow admins to decide enforce
> users seeing only masked data. Nevertheless, it's still useful for trusted
> database users generating masked data that will be consumed by the end
> users of the application.
>
> 2) Masking functions attached to specific columns. This way the same
> queries will see different data (masked or not) depending on the
> permissions of the user running the query. It has the advantage of not
> requiring to change the queries that users with different permissions run.
> The downside is that users would need to query the schema if they need to
> know whether a column is masked, unless we change the names of the returned
> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
> applying the masking function to columns on the base table, and some of
> them also allow to apply masking to views.
>
> 3) Masking functions as part of projected views. This ways users might
> need to query the view appropriate for their permissions instead of the
> base table. This might mean changing the queries if the masking policy is
> changed by the admin. MySQL recommends this approach on a blog entry,
> although it's not part of its main documentation for data masking, and the
> implementation has security issues. Some of the other databases offering
> the approach 2) as their main option also support masking on view columns.
>
> Each approach has its own advantages and limitations, and I don't think we
> necessarily have to choose. The CEP proposes implementing 1) and 2), but no
> one impedes us to also have 3) if we get to have projected views. However,
> I think that projected views is a new general-purpose feature with its own
> complexities, so it would deserve its own CEP, if someone is willing to
> work on the implementation.
>
>
>
> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
> dev@cassandra.apache.org> wrote:
>
>> Is there enough support here for VIEWS to be the implementation strategy
>> for displaying masking functions?
>>
>> It seems to me the view would have to store the query and apply a where
>> clause to it, so the same PK would be in play.
>>
>> It has data leaking properties.
>>
>> It has more use cases as it can be used to
>>
>>    - construct views that filter out sensitive columns
>>    - apply transforms to convert units of measure
>>
>> Are there more thoughts along this line?
>>
>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Claude Warren via dev <de...@cassandra.apache.org>.

My vote is B

On 07/09/2022 13:12, Benedict wrote:
> I’m not convinced there’s been adequate resolution over which approach 
> is adopted. I know you have expressed a preference for the table 
> schema approach, but the weight of other opinion so far appears to be 
> against this approach - even if it is broadly adopted by other 
> databases. I will note that Postgres does not adopt this approach, it 
> has a more sophisticated security label approach that has not been 
> proposed by anybody so far.
>
> I think extra weight should be given to the implementer’s preference, 
> so while I personally do not like the table schema approach, I am 
> happy to accept this is an industry norm, and leave the decision to you.
>
> However, we should ensure the community as a whole endorses this. I 
> think an indicative poll should be undertaken first, eg:
>
> A) We should implement the table schema approach, as proposed
> B) We should prefer the view approach, but I am not opposed to the 
> implementor selecting the table schema approach for this CEP
> C) We should NOT implement the table schema approach, and should 
> implement the view approach
> D) We should NOT implement the table schema approach, and should 
> implement some other scheme (or not implement this feature)
>
> Where my vote is B
>
>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
>>
>> 
>> If nobody has more concerns regarding the CEP I will start the vote 
>> tomorrow.
>>
>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña 
>> <ad...@apache.org> wrote:
>>
>>         Is there enough support here for VIEWS to be the
>>         implementation strategy for displaying masking functions?
>>
>>
>>     I'm not sure that views should be "the" strategy for masking
>>     functions. We have multiple approaches here:
>>
>>     1) CQL functions only. Users can decide to use the masking
>>     functions on their own will. I think most dbs allow this pattern
>>     of usage, which is quite straightforward. Obviously, it doesn't
>>     allow admins to decide enforce users seeing only masked data.
>>     Nevertheless, it's still useful for trusted database users
>>     generating masked data that will be consumed by the end users of
>>     the application.
>>
>>     2) Masking functions attached to specific columns. This way the
>>     same queries will see different data (masked or not) depending on
>>     the permissions of the user running the query. It has the
>>     advantage of not requiring to change the queries that users with
>>     different permissions run. The downside is that users would need
>>     to query the schema if they need to know whether a column is
>>     masked, unless we change the names of the returned columns. This
>>     is the approach offered by Azure/SQL Server, PostgreSQL, IBM Db2,
>>     Oracle, MariaDB/MaxScale and SnowFlake. All these databases
>>     support applying the masking function to columns on the base
>>     table, and some of them also allow to apply masking to views.
>>
>>     3) Masking functions as part of projected views. This ways users
>>     might need to query the view appropriate for their permissions
>>     instead of the base table. This might mean changing the queries
>>     if the masking policy is changed by the admin. MySQL recommends
>>     this approach on a blog entry, although it's not part of its main
>>     documentation for data masking, and the implementation has
>>     security issues. Some of the other databases offering the
>>     approach 2) as their main option also support masking on view
>>     columns.
>>
>>     Each approach has its own advantages and limitations, and I don't
>>     think we necessarily have to choose. The CEP proposes
>>     implementing 1) and 2), but no one impedes us to also have 3) if
>>     we get to have projected views. However, I think that projected
>>     views is a new general-purpose feature with its own complexities,
>>     so it would deserve its own CEP, if someone is willing to work on
>>     the implementation.
>>
>>
>>
>>     On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev
>>     <de...@cassandra.apache.org> wrote:
>>
>>         Is there enough support here for VIEWS to be the
>>         implementation strategy for displaying masking functions?
>>
>>         It seems to me the view would have to store the query and
>>         apply a where clause to it, so the same PK would be in play.
>>
>>         It has data leaking properties.
>>
>>         It has more use cases as it can be used to
>>
>>           * construct views that filter out sensitive columns
>>           * apply transforms to convert units of measure
>>
>>         Are there more thoughts along this line?
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Derek Chen-Becker <de...@chen-becker.org>.

My vote is B, but I think you should go ahead with the actual vote thread.

Cheers,

Derek

On Fri, Sep 16, 2022 at 4:05 AM Andrés de la Peña <ad...@apache.org>
wrote:

> It's been 9 days since we started the poll, and we haven't had any new
> vote since Monday. So we are still on 5 votes for A and 2 votes for B.
>
> The poll results doesn't seem to oppose the CEP. If no one has anything
> else to add, I'll start the actual vote thread.
>
> On Tue, 13 Sept 2022 at 15:05, Andrés de la Peña <ad...@apache.org>
> wrote:
>
>> That's 5 votes for A and 2 votes for B so far. None of these options
>> opposes to the CEP, so I think we can probably start the vote, unless we
>> want to wait longer for the poll.
>>
>> On Mon, 12 Sept 2022 at 13:51, Benjamin Lerer <bl...@apache.org> wrote:
>>
>>> A
>>>
>>> Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan <
>>> jeremiah.jordan@gmail.com> a écrit :
>>>
>>>> A
>>>>
>>>> On Sep 7, 2022, at 8:58 AM, Benedict <be...@apache.org> wrote:
>>>>
>>>> Well, I am not convinced these changes will materially impact the
>>>> outcome, but at least we’ll have some extra fun collating the votes.
>>>>
>>>>
>>>> On 7 Sep 2022, at 14:05, Andrés de la Peña <ad...@apache.org>
>>>> wrote:
>>>>
>>>> 
>>>> The poll makes sense to me. I would slightly change it to:
>>>>
>>>> A) We shouldn't prefer neither approach, and I agree to the implementor
>>>> selecting the table schema approach for this CEP
>>>> B) We should prefer the view approach, but I am not opposed to the
>>>> implementor selecting the table schema approach for this CEP
>>>> C) We should NOT implement the table schema approach, and should
>>>> implement the view approach
>>>> D) We should NOT implement the table view approach, and should
>>>> implement the schema approach
>>>> E) We should NOT implement the table schema approach, and should
>>>> implement some other scheme (or not implement this feature)
>>>>
>>>> Where my vote is for A.
>>>>
>>>>
>>>> On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>>>>
>>>>> I’m not convinced there’s been adequate resolution over which approach
>>>>> is adopted. I know you have expressed a preference for the table schema
>>>>> approach, but the weight of other opinion so far appears to be against this
>>>>> approach - even if it is broadly adopted by other databases. I will note
>>>>> that Postgres does not adopt this approach, it has a more sophisticated
>>>>> security label approach that has not been proposed by anybody so far.
>>>>>
>>>>> I think extra weight should be given to the implementer’s preference,
>>>>> so while I personally do not like the table schema approach, I am happy to
>>>>> accept this is an industry norm, and leave the decision to you.
>>>>>
>>>>> However, we should ensure the community as a whole endorses this. I
>>>>> think an indicative poll should be undertaken first, eg:
>>>>>
>>>>> A) We should implement the table schema approach, as proposed
>>>>> B) We should prefer the view approach, but I am not opposed to the
>>>>> implementor selecting the table schema approach for this CEP
>>>>> C) We should NOT implement the table schema approach, and should
>>>>> implement the view approach
>>>>> D) We should NOT implement the table schema approach, and should
>>>>> implement some other scheme (or not implement this feature)
>>>>>
>>>>> Where my vote is B
>>>>>
>>>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> If nobody has more concerns regarding the CEP I will start the vote
>>>>> tomorrow.
>>>>>
>>>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Is there enough support here for VIEWS to be the implementation
>>>>>>> strategy for displaying masking functions?
>>>>>>
>>>>>>
>>>>>> I'm not sure that views should be "the" strategy for masking
>>>>>> functions. We have multiple approaches here:
>>>>>>
>>>>>> 1) CQL functions only. Users can decide to use the masking functions
>>>>>> on their own will. I think most dbs allow this pattern of usage, which is
>>>>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>>>>>> users seeing only masked data. Nevertheless, it's still useful for trusted
>>>>>> database users generating masked data that will be consumed by the end
>>>>>> users of the application.
>>>>>>
>>>>>> 2) Masking functions attached to specific columns. This way the same
>>>>>> queries will see different data (masked or not) depending on the
>>>>>> permissions of the user running the query. It has the advantage of not
>>>>>> requiring to change the queries that users with different permissions run.
>>>>>> The downside is that users would need to query the schema if they need to
>>>>>> know whether a column is masked, unless we change the names of the returned
>>>>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>>>>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>>>>>> applying the masking function to columns on the base table, and some of
>>>>>> them also allow to apply masking to views.
>>>>>>
>>>>>> 3) Masking functions as part of projected views. This ways users
>>>>>> might need to query the view appropriate for their permissions instead of
>>>>>> the base table. This might mean changing the queries if the masking policy
>>>>>> is changed by the admin. MySQL recommends this approach on a blog entry,
>>>>>> although it's not part of its main documentation for data masking, and the
>>>>>> implementation has security issues. Some of the other databases offering
>>>>>> the approach 2) as their main option also support masking on view columns.
>>>>>>
>>>>>> Each approach has its own advantages and limitations, and I don't
>>>>>> think we necessarily have to choose. The CEP proposes implementing 1) and
>>>>>> 2), but no one impedes us to also have 3) if we get to have projected
>>>>>> views. However, I think that projected views is a new general-purpose
>>>>>> feature with its own complexities, so it would deserve its own CEP, if
>>>>>> someone is willing to work on the implementation.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>>>>>> dev@cassandra.apache.org> wrote:
>>>>>>
>>>>>>> Is there enough support here for VIEWS to be the implementation
>>>>>>> strategy for displaying masking functions?
>>>>>>>
>>>>>>> It seems to me the view would have to store the query and apply a
>>>>>>> where clause to it, so the same PK would be in play.
>>>>>>>
>>>>>>> It has data leaking properties.
>>>>>>>
>>>>>>> It has more use cases as it can be used to
>>>>>>>
>>>>>>>    - construct views that filter out sensitive columns
>>>>>>>    - apply transforms to convert units of measure
>>>>>>>
>>>>>>> Are there more thoughts along this line?
>>>>>>>
>>>>>>
>>>>

-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Andrés de la Peña <ad...@apache.org>.

It's been 9 days since we started the poll, and we haven't had any new vote
since Monday. So we are still on 5 votes for A and 2 votes for B.

The poll results doesn't seem to oppose the CEP. If no one has anything
else to add, I'll start the actual vote thread.

On Tue, 13 Sept 2022 at 15:05, Andrés de la Peña <ad...@apache.org>
wrote:

> That's 5 votes for A and 2 votes for B so far. None of these options
> opposes to the CEP, so I think we can probably start the vote, unless we
> want to wait longer for the poll.
>
> On Mon, 12 Sept 2022 at 13:51, Benjamin Lerer <bl...@apache.org> wrote:
>
>> A
>>
>> Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan <
>> jeremiah.jordan@gmail.com> a écrit :
>>
>>> A
>>>
>>> On Sep 7, 2022, at 8:58 AM, Benedict <be...@apache.org> wrote:
>>>
>>> Well, I am not convinced these changes will materially impact the
>>> outcome, but at least we’ll have some extra fun collating the votes.
>>>
>>>
>>> On 7 Sep 2022, at 14:05, Andrés de la Peña <ad...@apache.org> wrote:
>>>
>>> 
>>> The poll makes sense to me. I would slightly change it to:
>>>
>>> A) We shouldn't prefer neither approach, and I agree to the implementor
>>> selecting the table schema approach for this CEP
>>> B) We should prefer the view approach, but I am not opposed to the
>>> implementor selecting the table schema approach for this CEP
>>> C) We should NOT implement the table schema approach, and should
>>> implement the view approach
>>> D) We should NOT implement the table view approach, and should implement
>>> the schema approach
>>> E) We should NOT implement the table schema approach, and should
>>> implement some other scheme (or not implement this feature)
>>>
>>> Where my vote is for A.
>>>
>>>
>>> On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>>>
>>>> I’m not convinced there’s been adequate resolution over which approach
>>>> is adopted. I know you have expressed a preference for the table schema
>>>> approach, but the weight of other opinion so far appears to be against this
>>>> approach - even if it is broadly adopted by other databases. I will note
>>>> that Postgres does not adopt this approach, it has a more sophisticated
>>>> security label approach that has not been proposed by anybody so far.
>>>>
>>>> I think extra weight should be given to the implementer’s preference,
>>>> so while I personally do not like the table schema approach, I am happy to
>>>> accept this is an industry norm, and leave the decision to you.
>>>>
>>>> However, we should ensure the community as a whole endorses this. I
>>>> think an indicative poll should be undertaken first, eg:
>>>>
>>>> A) We should implement the table schema approach, as proposed
>>>> B) We should prefer the view approach, but I am not opposed to the
>>>> implementor selecting the table schema approach for this CEP
>>>> C) We should NOT implement the table schema approach, and should
>>>> implement the view approach
>>>> D) We should NOT implement the table schema approach, and should
>>>> implement some other scheme (or not implement this feature)
>>>>
>>>> Where my vote is B
>>>>
>>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org>
>>>> wrote:
>>>>
>>>> 
>>>> If nobody has more concerns regarding the CEP I will start the vote
>>>> tomorrow.
>>>>
>>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
>>>> wrote:
>>>>
>>>>> Is there enough support here for VIEWS to be the implementation
>>>>>> strategy for displaying masking functions?
>>>>>
>>>>>
>>>>> I'm not sure that views should be "the" strategy for masking
>>>>> functions. We have multiple approaches here:
>>>>>
>>>>> 1) CQL functions only. Users can decide to use the masking functions
>>>>> on their own will. I think most dbs allow this pattern of usage, which is
>>>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>>>>> users seeing only masked data. Nevertheless, it's still useful for trusted
>>>>> database users generating masked data that will be consumed by the end
>>>>> users of the application.
>>>>>
>>>>> 2) Masking functions attached to specific columns. This way the same
>>>>> queries will see different data (masked or not) depending on the
>>>>> permissions of the user running the query. It has the advantage of not
>>>>> requiring to change the queries that users with different permissions run.
>>>>> The downside is that users would need to query the schema if they need to
>>>>> know whether a column is masked, unless we change the names of the returned
>>>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>>>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>>>>> applying the masking function to columns on the base table, and some of
>>>>> them also allow to apply masking to views.
>>>>>
>>>>> 3) Masking functions as part of projected views. This ways users might
>>>>> need to query the view appropriate for their permissions instead of the
>>>>> base table. This might mean changing the queries if the masking policy is
>>>>> changed by the admin. MySQL recommends this approach on a blog entry,
>>>>> although it's not part of its main documentation for data masking, and the
>>>>> implementation has security issues. Some of the other databases offering
>>>>> the approach 2) as their main option also support masking on view columns.
>>>>>
>>>>> Each approach has its own advantages and limitations, and I don't
>>>>> think we necessarily have to choose. The CEP proposes implementing 1) and
>>>>> 2), but no one impedes us to also have 3) if we get to have projected
>>>>> views. However, I think that projected views is a new general-purpose
>>>>> feature with its own complexities, so it would deserve its own CEP, if
>>>>> someone is willing to work on the implementation.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>>>>> dev@cassandra.apache.org> wrote:
>>>>>
>>>>>> Is there enough support here for VIEWS to be the implementation
>>>>>> strategy for displaying masking functions?
>>>>>>
>>>>>> It seems to me the view would have to store the query and apply a
>>>>>> where clause to it, so the same PK would be in play.
>>>>>>
>>>>>> It has data leaking properties.
>>>>>>
>>>>>> It has more use cases as it can be used to
>>>>>>
>>>>>>    - construct views that filter out sensitive columns
>>>>>>    - apply transforms to convert units of measure
>>>>>>
>>>>>> Are there more thoughts along this line?
>>>>>>
>>>>>
>>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Andrés de la Peña <ad...@apache.org>.

That's 5 votes for A and 2 votes for B so far. None of these options
opposes to the CEP, so I think we can probably start the vote, unless we
want to wait longer for the poll.

On Mon, 12 Sept 2022 at 13:51, Benjamin Lerer <bl...@apache.org> wrote:

> A
>
> Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan <je...@gmail.com>
> a écrit :
>
>> A
>>
>> On Sep 7, 2022, at 8:58 AM, Benedict <be...@apache.org> wrote:
>>
>> Well, I am not convinced these changes will materially impact the
>> outcome, but at least we’ll have some extra fun collating the votes.
>>
>>
>> On 7 Sep 2022, at 14:05, Andrés de la Peña <ad...@apache.org> wrote:
>>
>> 
>> The poll makes sense to me. I would slightly change it to:
>>
>> A) We shouldn't prefer neither approach, and I agree to the implementor
>> selecting the table schema approach for this CEP
>> B) We should prefer the view approach, but I am not opposed to the
>> implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should
>> implement the view approach
>> D) We should NOT implement the table view approach, and should implement
>> the schema approach
>> E) We should NOT implement the table schema approach, and should
>> implement some other scheme (or not implement this feature)
>>
>> Where my vote is for A.
>>
>>
>> On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>>
>>> I’m not convinced there’s been adequate resolution over which approach
>>> is adopted. I know you have expressed a preference for the table schema
>>> approach, but the weight of other opinion so far appears to be against this
>>> approach - even if it is broadly adopted by other databases. I will note
>>> that Postgres does not adopt this approach, it has a more sophisticated
>>> security label approach that has not been proposed by anybody so far.
>>>
>>> I think extra weight should be given to the implementer’s preference, so
>>> while I personally do not like the table schema approach, I am happy to
>>> accept this is an industry norm, and leave the decision to you.
>>>
>>> However, we should ensure the community as a whole endorses this. I
>>> think an indicative poll should be undertaken first, eg:
>>>
>>> A) We should implement the table schema approach, as proposed
>>> B) We should prefer the view approach, but I am not opposed to the
>>> implementor selecting the table schema approach for this CEP
>>> C) We should NOT implement the table schema approach, and should
>>> implement the view approach
>>> D) We should NOT implement the table schema approach, and should
>>> implement some other scheme (or not implement this feature)
>>>
>>> Where my vote is B
>>>
>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
>>>
>>> 
>>> If nobody has more concerns regarding the CEP I will start the vote
>>> tomorrow.
>>>
>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
>>> wrote:
>>>
>>>> Is there enough support here for VIEWS to be the implementation
>>>>> strategy for displaying masking functions?
>>>>
>>>>
>>>> I'm not sure that views should be "the" strategy for masking functions.
>>>> We have multiple approaches here:
>>>>
>>>> 1) CQL functions only. Users can decide to use the masking functions on
>>>> their own will. I think most dbs allow this pattern of usage, which is
>>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>>>> users seeing only masked data. Nevertheless, it's still useful for trusted
>>>> database users generating masked data that will be consumed by the end
>>>> users of the application.
>>>>
>>>> 2) Masking functions attached to specific columns. This way the same
>>>> queries will see different data (masked or not) depending on the
>>>> permissions of the user running the query. It has the advantage of not
>>>> requiring to change the queries that users with different permissions run.
>>>> The downside is that users would need to query the schema if they need to
>>>> know whether a column is masked, unless we change the names of the returned
>>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>>>> applying the masking function to columns on the base table, and some of
>>>> them also allow to apply masking to views.
>>>>
>>>> 3) Masking functions as part of projected views. This ways users might
>>>> need to query the view appropriate for their permissions instead of the
>>>> base table. This might mean changing the queries if the masking policy is
>>>> changed by the admin. MySQL recommends this approach on a blog entry,
>>>> although it's not part of its main documentation for data masking, and the
>>>> implementation has security issues. Some of the other databases offering
>>>> the approach 2) as their main option also support masking on view columns.
>>>>
>>>> Each approach has its own advantages and limitations, and I don't think
>>>> we necessarily have to choose. The CEP proposes implementing 1) and 2), but
>>>> no one impedes us to also have 3) if we get to have projected views.
>>>> However, I think that projected views is a new general-purpose feature with
>>>> its own complexities, so it would deserve its own CEP, if someone is
>>>> willing to work on the implementation.
>>>>
>>>>
>>>>
>>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> Is there enough support here for VIEWS to be the implementation
>>>>> strategy for displaying masking functions?
>>>>>
>>>>> It seems to me the view would have to store the query and apply a
>>>>> where clause to it, so the same PK would be in play.
>>>>>
>>>>> It has data leaking properties.
>>>>>
>>>>> It has more use cases as it can be used to
>>>>>
>>>>>    - construct views that filter out sensitive columns
>>>>>    - apply transforms to convert units of measure
>>>>>
>>>>> Are there more thoughts along this line?
>>>>>
>>>>
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Benjamin Lerer <bl...@apache.org>.

A

Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan <je...@gmail.com>
a écrit :

> A
>
> On Sep 7, 2022, at 8:58 AM, Benedict <be...@apache.org> wrote:
>
> Well, I am not convinced these changes will materially impact the outcome,
> but at least we’ll have some extra fun collating the votes.
>
>
> On 7 Sep 2022, at 14:05, Andrés de la Peña <ad...@apache.org> wrote:
>
> 
> The poll makes sense to me. I would slightly change it to:
>
> A) We shouldn't prefer neither approach, and I agree to the implementor
> selecting the table schema approach for this CEP
> B) We should prefer the view approach, but I am not opposed to the
> implementor selecting the table schema approach for this CEP
> C) We should NOT implement the table schema approach, and should implement
> the view approach
> D) We should NOT implement the table view approach, and should implement
> the schema approach
> E) We should NOT implement the table schema approach, and should implement
> some other scheme (or not implement this feature)
>
> Where my vote is for A.
>
>
> On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>
>> I’m not convinced there’s been adequate resolution over which approach is
>> adopted. I know you have expressed a preference for the table schema
>> approach, but the weight of other opinion so far appears to be against this
>> approach - even if it is broadly adopted by other databases. I will note
>> that Postgres does not adopt this approach, it has a more sophisticated
>> security label approach that has not been proposed by anybody so far.
>>
>> I think extra weight should be given to the implementer’s preference, so
>> while I personally do not like the table schema approach, I am happy to
>> accept this is an industry norm, and leave the decision to you.
>>
>> However, we should ensure the community as a whole endorses this. I think
>> an indicative poll should be undertaken first, eg:
>>
>> A) We should implement the table schema approach, as proposed
>> B) We should prefer the view approach, but I am not opposed to the
>> implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should
>> implement the view approach
>> D) We should NOT implement the table schema approach, and should
>> implement some other scheme (or not implement this feature)
>>
>> Where my vote is B
>>
>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
>>
>> 
>> If nobody has more concerns regarding the CEP I will start the vote
>> tomorrow.
>>
>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
>> wrote:
>>
>>> Is there enough support here for VIEWS to be the implementation strategy
>>>> for displaying masking functions?
>>>
>>>
>>> I'm not sure that views should be "the" strategy for masking functions.
>>> We have multiple approaches here:
>>>
>>> 1) CQL functions only. Users can decide to use the masking functions on
>>> their own will. I think most dbs allow this pattern of usage, which is
>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>>> users seeing only masked data. Nevertheless, it's still useful for trusted
>>> database users generating masked data that will be consumed by the end
>>> users of the application.
>>>
>>> 2) Masking functions attached to specific columns. This way the same
>>> queries will see different data (masked or not) depending on the
>>> permissions of the user running the query. It has the advantage of not
>>> requiring to change the queries that users with different permissions run.
>>> The downside is that users would need to query the schema if they need to
>>> know whether a column is masked, unless we change the names of the returned
>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>>> applying the masking function to columns on the base table, and some of
>>> them also allow to apply masking to views.
>>>
>>> 3) Masking functions as part of projected views. This ways users might
>>> need to query the view appropriate for their permissions instead of the
>>> base table. This might mean changing the queries if the masking policy is
>>> changed by the admin. MySQL recommends this approach on a blog entry,
>>> although it's not part of its main documentation for data masking, and the
>>> implementation has security issues. Some of the other databases offering
>>> the approach 2) as their main option also support masking on view columns.
>>>
>>> Each approach has its own advantages and limitations, and I don't think
>>> we necessarily have to choose. The CEP proposes implementing 1) and 2), but
>>> no one impedes us to also have 3) if we get to have projected views.
>>> However, I think that projected views is a new general-purpose feature with
>>> its own complexities, so it would deserve its own CEP, if someone is
>>> willing to work on the implementation.
>>>
>>>
>>>
>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Is there enough support here for VIEWS to be the implementation
>>>> strategy for displaying masking functions?
>>>>
>>>> It seems to me the view would have to store the query and apply a where
>>>> clause to it, so the same PK would be in play.
>>>>
>>>> It has data leaking properties.
>>>>
>>>> It has more use cases as it can be used to
>>>>
>>>>    - construct views that filter out sensitive columns
>>>>    - apply transforms to convert units of measure
>>>>
>>>> Are there more thoughts along this line?
>>>>
>>>
>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Jeremiah D Jordan <je...@gmail.com>.

A

> On Sep 7, 2022, at 8:58 AM, Benedict <be...@apache.org> wrote:
> 
> Well, I am not convinced these changes will materially impact the outcome, but at least we’ll have some extra fun collating the votes.
> 
> 
>> On 7 Sep 2022, at 14:05, Andrés de la Peña <ad...@apache.org> wrote:
>> 
>> 
>> The poll makes sense to me. I would slightly change it to:
>> 
>> A) We shouldn't prefer neither approach, and I agree to the implementor selecting the table schema approach for this CEP
>> B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should implement the view approach
>> D) We should NOT implement the table view approach, and should implement the schema approach
>> E) We should NOT implement the table schema approach, and should implement some other scheme (or not implement this feature)
>> 
>> Where my vote is for A.
>> 
>> 
>> On Wed, 7 Sept 2022 at 13:12, Benedict <benedict@apache.org <ma...@apache.org>> wrote:
>> I’m not convinced there’s been adequate resolution over which approach is adopted. I know you have expressed a preference for the table schema approach, but the weight of other opinion so far appears to be against this approach - even if it is broadly adopted by other databases. I will note that Postgres does not adopt this approach, it has a more sophisticated security label approach that has not been proposed by anybody so far.
>> 
>> I think extra weight should be given to the implementer’s preference, so while I personally do not like the table schema approach, I am happy to accept this is an industry norm, and leave the decision to you.
>> 
>> However, we should ensure the community as a whole endorses this. I think an indicative poll should be undertaken first, eg:
>> 
>> A) We should implement the table schema approach, as proposed
>> B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should implement the view approach
>> D) We should NOT implement the table schema approach, and should implement some other scheme (or not implement this feature)
>> 
>> Where my vote is B
>> 
>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <adelapena@apache.org <ma...@apache.org>> wrote:
>>> 
>>> 
>>> If nobody has more concerns regarding the CEP I will start the vote tomorrow.
>>> 
>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <adelapena@apache.org <ma...@apache.org>> wrote:
>>> Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions?
>>> 
>>> I'm not sure that views should be "the" strategy for masking functions. We have multiple approaches here:
>>> 
>>> 1) CQL functions only. Users can decide to use the masking functions on their own will. I think most dbs allow this pattern of usage, which is quite straightforward. Obviously, it doesn't allow admins to decide enforce users seeing only masked data. Nevertheless, it's still useful for trusted database users generating masked data that will be consumed by the end users of the application.
>>> 
>>> 2) Masking functions attached to specific columns. This way the same queries will see different data (masked or not) depending on the permissions of the user running the query. It has the advantage of not requiring to change the queries that users with different permissions run. The downside is that users would need to query the schema if they need to know whether a column is masked, unless we change the names of the returned columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support applying the masking function to columns on the base table, and some of them also allow to apply masking to views.
>>> 
>>> 3) Masking functions as part of projected views. This ways users might need to query the view appropriate for their permissions instead of the base table. This might mean changing the queries if the masking policy is changed by the admin. MySQL recommends this approach on a blog entry, although it's not part of its main documentation for data masking, and the implementation has security issues. Some of the other databases offering the approach 2) as their main option also support masking on view columns.
>>> 
>>> Each approach has its own advantages and limitations, and I don't think we necessarily have to choose. The CEP proposes implementing 1) and 2), but no one impedes us to also have 3) if we get to have projected views. However, I think that projected views is a new general-purpose feature with its own complexities, so it would deserve its own CEP, if someone is willing to work on the implementation.
>>> 
>>> 
>>> 
>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <dev@cassandra.apache.org <ma...@cassandra.apache.org>> wrote:
>>> Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions?
>>> 
>>> It seems to me the view would have to store the query and apply a where clause to it, so the same PK would be in play.
>>> 
>>> It has data leaking properties.
>>> 
>>> It has more use cases as it can be used to
>>> 
>>> construct views that filter out sensitive columns
>>> apply transforms to convert units of measure
>>> Are there more thoughts along this line?
>>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Benedict <be...@apache.org>.

Well, I am not convinced these changes will materially impact the outcome, but at least we’ll have some extra fun collating the votes.


> On 7 Sep 2022, at 14:05, Andrés de la Peña <ad...@apache.org> wrote:
> 
> 
> The poll makes sense to me. I would slightly change it to:
> 
> A) We shouldn't prefer neither approach, and I agree to the implementor selecting the table schema approach for this CEP
> B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach for this CEP
> C) We should NOT implement the table schema approach, and should implement the view approach
> D) We should NOT implement the table view approach, and should implement the schema approach
> E) We should NOT implement the table schema approach, and should implement some other scheme (or not implement this feature)
> 
> Where my vote is for A.
> 
> 
>> On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>> I’m not convinced there’s been adequate resolution over which approach is adopted. I know you have expressed a preference for the table schema approach, but the weight of other opinion so far appears to be against this approach - even if it is broadly adopted by other databases. I will note that Postgres does not adopt this approach, it has a more sophisticated security label approach that has not been proposed by anybody so far.
>> 
>> I think extra weight should be given to the implementer’s preference, so while I personally do not like the table schema approach, I am happy to accept this is an industry norm, and leave the decision to you.
>> 
>> However, we should ensure the community as a whole endorses this. I think an indicative poll should be undertaken first, eg:
>> 
>> A) We should implement the table schema approach, as proposed
>> B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should implement the view approach
>> D) We should NOT implement the table schema approach, and should implement some other scheme (or not implement this feature)
>> 
>> Where my vote is B
>> 
>>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
>>>> 
>>> 
>>> If nobody has more concerns regarding the CEP I will start the vote tomorrow.
>>> 
>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org> wrote:
>>>>> Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions?
>>>> 
>>>> I'm not sure that views should be "the" strategy for masking functions. We have multiple approaches here:
>>>> 
>>>> 1) CQL functions only. Users can decide to use the masking functions on their own will. I think most dbs allow this pattern of usage, which is quite straightforward. Obviously, it doesn't allow admins to decide enforce users seeing only masked data. Nevertheless, it's still useful for trusted database users generating masked data that will be consumed by the end users of the application.
>>>> 
>>>> 2) Masking functions attached to specific columns. This way the same queries will see different data (masked or not) depending on the permissions of the user running the query. It has the advantage of not requiring to change the queries that users with different permissions run. The downside is that users would need to query the schema if they need to know whether a column is masked, unless we change the names of the returned columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support applying the masking function to columns on the base table, and some of them also allow to apply masking to views.
>>>> 
>>>> 3) Masking functions as part of projected views. This ways users might need to query the view appropriate for their permissions instead of the base table. This might mean changing the queries if the masking policy is changed by the admin. MySQL recommends this approach on a blog entry, although it's not part of its main documentation for data masking, and the implementation has security issues. Some of the other databases offering the approach 2) as their main option also support masking on view columns.
>>>> 
>>>> Each approach has its own advantages and limitations, and I don't think we necessarily have to choose. The CEP proposes implementing 1) and 2), but no one impedes us to also have 3) if we get to have projected views. However, I think that projected views is a new general-purpose feature with its own complexities, so it would deserve its own CEP, if someone is willing to work on the implementation.
>>>> 
>>>> 
>>>> 
>>>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <de...@cassandra.apache.org> wrote:
>>>>> Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions?
>>>>> 
>>>>> It seems to me the view would have to store the query and apply a where clause to it, so the same PK would be in play.
>>>>> 
>>>>> It has data leaking properties.
>>>>> 
>>>>> It has more use cases as it can be used to
>>>>> 
>>>>> construct views that filter out sensitive columns
>>>>> apply transforms to convert units of measure
>>>>> Are there more thoughts along this line?

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Berenguer Blasi <be...@gmail.com>.

A. I agree the implementor's preference is an important aspect to take 
into account.

On 7/9/22 15:23, Ekaterina Dimitrova wrote:
> A
>
> On Wed, 7 Sep 2022 at 9:05, Andrés de la Peña <ad...@apache.org> 
> wrote:
>
>     The poll makes sense to me. I would slightly change it to:
>
>     A) We shouldn't prefer neither approach, and I agree to the
>     implementor selecting the table schema approach for this CEP
>     B) We should prefer the view approach, but I am not opposed to the
>     implementor selecting the table schema approach for this CEP
>     C) We should NOT implement the table schema approach, and should
>     implement the view approach
>     D) We should NOT implement the table view approach, and should
>     implement the schema approach
>     E) We should NOT implement the table schema approach, and should
>     implement some other scheme (or not implement this feature)
>
>     Where my vote is for A.
>
>
>     On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>
>         I’m not convinced there’s been adequate resolution over which
>         approach is adopted. I know you have expressed a preference
>         for the table schema approach, but the weight of other opinion
>         so far appears to be against this approach - even if it is
>         broadly adopted by other databases. I will note that Postgres
>         does not adopt this approach, it has a more sophisticated
>         security label approach that has not been proposed by anybody
>         so far.
>
>         I think extra weight should be given to the implementer’s
>         preference, so while I personally do not like the table schema
>         approach, I am happy to accept this is an industry norm, and
>         leave the decision to you.
>
>         However, we should ensure the community as a whole endorses
>         this. I think an indicative poll should be undertaken first, eg:
>
>         A) We should implement the table schema approach, as proposed
>         B) We should prefer the view approach, but I am not opposed to
>         the implementor selecting the table schema approach for this CEP
>         C) We should NOT implement the table schema approach, and
>         should implement the view approach
>         D) We should NOT implement the table schema approach, and
>         should implement some other scheme (or not implement this feature)
>
>         Where my vote is B
>
>>         On 7 Sep 2022, at 12:50, Andrés de la Peña
>>         <ad...@apache.org> wrote:
>>
>>         
>>         If nobody has more concerns regarding the CEP I will start
>>         the vote tomorrow.
>>
>>         On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña
>>         <ad...@apache.org> wrote:
>>
>>                 Is there enough support here for VIEWS to be the
>>                 implementation strategy for displaying masking functions?
>>
>>
>>             I'm not sure that views should be "the" strategy for
>>             masking functions. We have multiple approaches here:
>>
>>             1) CQL functions only. Users can decide to use the
>>             masking functions on their own will. I think most dbs
>>             allow this pattern of usage, which is quite
>>             straightforward. Obviously, it doesn't allow admins to
>>             decide enforce users seeing only masked data.
>>             Nevertheless, it's still useful for trusted database
>>             users generating masked data that will be consumed by the
>>             end users of the application.
>>
>>             2) Masking functions attached to specific columns. This
>>             way the same queries will see different data (masked or
>>             not) depending on the permissions of the user running the
>>             query. It has the advantage of not requiring to change
>>             the queries that users with different permissions run.
>>             The downside is that users would need to query the schema
>>             if they need to know whether a column is masked, unless
>>             we change the names of the returned columns. This is the
>>             approach offered by Azure/SQL Server, PostgreSQL, IBM
>>             Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these
>>             databases support applying the masking function to
>>             columns on the base table, and some of them also allow to
>>             apply masking to views.
>>
>>             3) Masking functions as part of projected views. This
>>             ways users might need to query the view appropriate for
>>             their permissions instead of the base table. This might
>>             mean changing the queries if the masking policy is
>>             changed by the admin. MySQL recommends this approach on a
>>             blog entry, although it's not part of its main
>>             documentation for data masking, and the implementation
>>             has security issues. Some of the other databases offering
>>             the approach 2) as their main option also support masking
>>             on view columns.
>>
>>             Each approach has its own advantages and limitations, and
>>             I don't think we necessarily have to choose. The CEP
>>             proposes implementing 1) and 2), but no one impedes us to
>>             also have 3) if we get to have projected views. However,
>>             I think that projected views is a new general-purpose
>>             feature with its own complexities, so it would deserve
>>             its own CEP, if someone is willing to work on the
>>             implementation.
>>
>>
>>
>>             On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev
>>             <de...@cassandra.apache.org> wrote:
>>
>>                 Is there enough support here for VIEWS to be the
>>                 implementation strategy for displaying masking functions?
>>
>>                 It seems to me the view would have to store the query
>>                 and apply a where clause to it, so the same PK would
>>                 be in play.
>>
>>                 It has data leaking properties.
>>
>>                 It has more use cases as it can be used to
>>
>>                   * construct views that filter out sensitive columns
>>                   * apply transforms to convert units of measure
>>
>>                 Are there more thoughts along this line?
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Ekaterina Dimitrova <e....@gmail.com>.

A

On Wed, 7 Sep 2022 at 9:05, Andrés de la Peña <ad...@apache.org> wrote:

> The poll makes sense to me. I would slightly change it to:
>
> A) We shouldn't prefer neither approach, and I agree to the implementor
> selecting the table schema approach for this CEP
> B) We should prefer the view approach, but I am not opposed to the
> implementor selecting the table schema approach for this CEP
> C) We should NOT implement the table schema approach, and should implement
> the view approach
> D) We should NOT implement the table view approach, and should implement
> the schema approach
> E) We should NOT implement the table schema approach, and should implement
> some other scheme (or not implement this feature)
>
> Where my vote is for A.
>
>
> On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:
>
>> I’m not convinced there’s been adequate resolution over which approach is
>> adopted. I know you have expressed a preference for the table schema
>> approach, but the weight of other opinion so far appears to be against this
>> approach - even if it is broadly adopted by other databases. I will note
>> that Postgres does not adopt this approach, it has a more sophisticated
>> security label approach that has not been proposed by anybody so far.
>>
>> I think extra weight should be given to the implementer’s preference, so
>> while I personally do not like the table schema approach, I am happy to
>> accept this is an industry norm, and leave the decision to you.
>>
>> However, we should ensure the community as a whole endorses this. I think
>> an indicative poll should be undertaken first, eg:
>>
>> A) We should implement the table schema approach, as proposed
>> B) We should prefer the view approach, but I am not opposed to the
>> implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should
>> implement the view approach
>> D) We should NOT implement the table schema approach, and should
>> implement some other scheme (or not implement this feature)
>>
>> Where my vote is B
>>
>> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
>>
>> 
>> If nobody has more concerns regarding the CEP I will start the vote
>> tomorrow.
>>
>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
>> wrote:
>>
>>> Is there enough support here for VIEWS to be the implementation strategy
>>>> for displaying masking functions?
>>>
>>>
>>> I'm not sure that views should be "the" strategy for masking functions.
>>> We have multiple approaches here:
>>>
>>> 1) CQL functions only. Users can decide to use the masking functions on
>>> their own will. I think most dbs allow this pattern of usage, which is
>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>>> users seeing only masked data. Nevertheless, it's still useful for trusted
>>> database users generating masked data that will be consumed by the end
>>> users of the application.
>>>
>>> 2) Masking functions attached to specific columns. This way the same
>>> queries will see different data (masked or not) depending on the
>>> permissions of the user running the query. It has the advantage of not
>>> requiring to change the queries that users with different permissions run.
>>> The downside is that users would need to query the schema if they need to
>>> know whether a column is masked, unless we change the names of the returned
>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>>> applying the masking function to columns on the base table, and some of
>>> them also allow to apply masking to views.
>>>
>>> 3) Masking functions as part of projected views. This ways users might
>>> need to query the view appropriate for their permissions instead of the
>>> base table. This might mean changing the queries if the masking policy is
>>> changed by the admin. MySQL recommends this approach on a blog entry,
>>> although it's not part of its main documentation for data masking, and the
>>> implementation has security issues. Some of the other databases offering
>>> the approach 2) as their main option also support masking on view columns.
>>>
>>> Each approach has its own advantages and limitations, and I don't think
>>> we necessarily have to choose. The CEP proposes implementing 1) and 2), but
>>> no one impedes us to also have 3) if we get to have projected views.
>>> However, I think that projected views is a new general-purpose feature with
>>> its own complexities, so it would deserve its own CEP, if someone is
>>> willing to work on the implementation.
>>>
>>>
>>>
>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Is there enough support here for VIEWS to be the implementation
>>>> strategy for displaying masking functions?
>>>>
>>>> It seems to me the view would have to store the query and apply a where
>>>> clause to it, so the same PK would be in play.
>>>>
>>>> It has data leaking properties.
>>>>
>>>> It has more use cases as it can be used to
>>>>
>>>>    - construct views that filter out sensitive columns
>>>>    - apply transforms to convert units of measure
>>>>
>>>> Are there more thoughts along this line?
>>>>
>>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Andrés de la Peña <ad...@apache.org>.

The poll makes sense to me. I would slightly change it to:

A) We shouldn't prefer neither approach, and I agree to the implementor
selecting the table schema approach for this CEP
B) We should prefer the view approach, but I am not opposed to the
implementor selecting the table schema approach for this CEP
C) We should NOT implement the table schema approach, and should implement
the view approach
D) We should NOT implement the table view approach, and should implement
the schema approach
E) We should NOT implement the table schema approach, and should implement
some other scheme (or not implement this feature)

Where my vote is for A.


On Wed, 7 Sept 2022 at 13:12, Benedict <be...@apache.org> wrote:

> I’m not convinced there’s been adequate resolution over which approach is
> adopted. I know you have expressed a preference for the table schema
> approach, but the weight of other opinion so far appears to be against this
> approach - even if it is broadly adopted by other databases. I will note
> that Postgres does not adopt this approach, it has a more sophisticated
> security label approach that has not been proposed by anybody so far.
>
> I think extra weight should be given to the implementer’s preference, so
> while I personally do not like the table schema approach, I am happy to
> accept this is an industry norm, and leave the decision to you.
>
> However, we should ensure the community as a whole endorses this. I think
> an indicative poll should be undertaken first, eg:
>
> A) We should implement the table schema approach, as proposed
> B) We should prefer the view approach, but I am not opposed to the
> implementor selecting the table schema approach for this CEP
> C) We should NOT implement the table schema approach, and should implement
> the view approach
> D) We should NOT implement the table schema approach, and should implement
> some other scheme (or not implement this feature)
>
> Where my vote is B
>
> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
>
> 
> If nobody has more concerns regarding the CEP I will start the vote
> tomorrow.
>
> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org>
> wrote:
>
>> Is there enough support here for VIEWS to be the implementation strategy
>>> for displaying masking functions?
>>
>>
>> I'm not sure that views should be "the" strategy for masking functions.
>> We have multiple approaches here:
>>
>> 1) CQL functions only. Users can decide to use the masking functions on
>> their own will. I think most dbs allow this pattern of usage, which is
>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>> users seeing only masked data. Nevertheless, it's still useful for trusted
>> database users generating masked data that will be consumed by the end
>> users of the application.
>>
>> 2) Masking functions attached to specific columns. This way the same
>> queries will see different data (masked or not) depending on the
>> permissions of the user running the query. It has the advantage of not
>> requiring to change the queries that users with different permissions run.
>> The downside is that users would need to query the schema if they need to
>> know whether a column is masked, unless we change the names of the returned
>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>> applying the masking function to columns on the base table, and some of
>> them also allow to apply masking to views.
>>
>> 3) Masking functions as part of projected views. This ways users might
>> need to query the view appropriate for their permissions instead of the
>> base table. This might mean changing the queries if the masking policy is
>> changed by the admin. MySQL recommends this approach on a blog entry,
>> although it's not part of its main documentation for data masking, and the
>> implementation has security issues. Some of the other databases offering
>> the approach 2) as their main option also support masking on view columns.
>>
>> Each approach has its own advantages and limitations, and I don't think
>> we necessarily have to choose. The CEP proposes implementing 1) and 2), but
>> no one impedes us to also have 3) if we get to have projected views.
>> However, I think that projected views is a new general-purpose feature with
>> its own complexities, so it would deserve its own CEP, if someone is
>> willing to work on the implementation.
>>
>>
>>
>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> Is there enough support here for VIEWS to be the implementation strategy
>>> for displaying masking functions?
>>>
>>> It seems to me the view would have to store the query and apply a where
>>> clause to it, so the same PK would be in play.
>>>
>>> It has data leaking properties.
>>>
>>> It has more use cases as it can be used to
>>>
>>>    - construct views that filter out sensitive columns
>>>    - apply transforms to convert units of measure
>>>
>>> Are there more thoughts along this line?
>>>
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Posted by Benedict <be...@apache.org>.

I’m not convinced there’s been adequate resolution over which approach is adopted. I know you have expressed a preference for the table schema approach, but the weight of other opinion so far appears to be against this approach - even if it is broadly adopted by other databases. I will note that Postgres does not adopt this approach, it has a more sophisticated security label approach that has not been proposed by anybody so far.

I think extra weight should be given to the implementer’s preference, so while I personally do not like the table schema approach, I am happy to accept this is an industry norm, and leave the decision to you.

However, we should ensure the community as a whole endorses this. I think an indicative poll should be undertaken first, eg:

A) We should implement the table schema approach, as proposed
B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach for this CEP
C) We should NOT implement the table schema approach, and should implement the view approach
D) We should NOT implement the table schema approach, and should implement some other scheme (or not implement this feature)

Where my vote is B

> On 7 Sep 2022, at 12:50, Andrés de la Peña <ad...@apache.org> wrote:
> 
> 
> If nobody has more concerns regarding the CEP I will start the vote tomorrow.
> 
> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <ad...@apache.org> wrote:
>>> Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions?
>> 
>> I'm not sure that views should be "the" strategy for masking functions. We have multiple approaches here:
>> 
>> 1) CQL functions only. Users can decide to use the masking functions on their own will. I think most dbs allow this pattern of usage, which is quite straightforward. Obviously, it doesn't allow admins to decide enforce users seeing only masked data. Nevertheless, it's still useful for trusted database users generating masked data that will be consumed by the end users of the application.
>> 
>> 2) Masking functions attached to specific columns. This way the same queries will see different data (masked or not) depending on the permissions of the user running the query. It has the advantage of not requiring to change the queries that users with different permissions run. The downside is that users would need to query the schema if they need to know whether a column is masked, unless we change the names of the returned columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support applying the masking function to columns on the base table, and some of them also allow to apply masking to views.
>> 
>> 3) Masking functions as part of projected views. This ways users might need to query the view appropriate for their permissions instead of the base table. This might mean changing the queries if the masking policy is changed by the admin. MySQL recommends this approach on a blog entry, although it's not part of its main documentation for data masking, and the implementation has security issues. Some of the other databases offering the approach 2) as their main option also support masking on view columns.
>> 
>> Each approach has its own advantages and limitations, and I don't think we necessarily have to choose. The CEP proposes implementing 1) and 2), but no one impedes us to also have 3) if we get to have projected views. However, I think that projected views is a new general-purpose feature with its own complexities, so it would deserve its own CEP, if someone is willing to work on the implementation.
>> 
>> 
>> 
>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <de...@cassandra.apache.org> wrote:
>>> Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions?
>>> 
>>> It seems to me the view would have to store the query and apply a where clause to it, so the same PK would be in play.
>>> 
>>> It has data leaking properties.
>>> 
>>> It has more use cases as it can be used to
>>> 
>>> construct views that filter out sensitive columns
>>> apply transforms to convert units of measure
>>> Are there more thoughts along this line?