You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Apolloni, Christian" <ch...@baloise.ch> on 2020/08/19 09:28:16 UTC

GDPR compliance

Hello,

I have some questions about implementing GDPR compliance in Kafka.

In our situation we have the requirement of removing personal data from in coordination with multiple systems. The idea is having a central "coordinator system" which triggers the deletion process for the individual systems in a specific, controlled sequence which takes into account the various system inter-dependencies and data flows. This means e.g. system nr. 2 will receive the delete order only after system nr. 1 has reported that it's done with the deletion on its side (and so forth).

One of the systems in question publishes data in Kafka topics for consumption in other systems and part of the deletion process is to remove the relevant personal data from these Kafka topics too. This has to happen in a relatively short time after the deletion order is received, to prevent a long delay before the systems further down the chain can start their own deletion. Furthermore, we need to know when the operation is completed: only at that point we can give the "go" to the other systems.

We are unsure how to satisfy those requirements in Kafka. If anyone has ideas or suggestions we would be very interested in your opinion. We are also interested in general about experiences in implementing GDPR compliance in Kafka, especially when dealing with multiple, interconnected systems.

Kind regards,

-- 
Christian Apolloni
      
Disclaimer: The contents of this email and any attachment thereto are intended exclusively for the attention of the addressee(s). The email and any such attachment(s) may contain information that is confidential and protected on the strength of professional, official or business secrecy laws and regulations or contractual obligations. Should you have received this email by mistake, you may neither make use of nor divulge the contents of the email or of any attachment thereto. In such a case, please inform the email's sender and delete the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under http://www.baloise.ch/email_disclaimer

Re: GDPR compliance

Posted by Jörn Franke <jo...@gmail.com>.
Be aware that deleting personal data is already processing ! You will already need user consent to process it In Kafka - even if it is about deletion .

Simply do not collect it. 

> Am 19.08.2020 um 16:53 schrieb Apolloni, Christian <ch...@baloise.ch>:
> 
> Hello,
> 
> I have some questions about implementing GDPR compliance in Kafka.
> 
> In our situation we have the requirement of removing personal data from in coordination with multiple systems. The idea is having a central "coordinator system" which triggers the deletion process for the individual systems in a specific, controlled sequence which takes into account the various system inter-dependencies and data flows. This means e.g. system nr. 2 will receive the delete order only after system nr. 1 has reported that it's done with the deletion on its side (and so forth).
> 
> One of the systems in question publishes data in Kafka topics for consumption in other systems and part of the deletion process is to remove the relevant personal data from these Kafka topics too. This has to happen in a relatively short time after the deletion order is received, to prevent a long delay before the systems further down the chain can start their own deletion. Furthermore, we need to know when the operation is completed: only at that point we can give the "go" to the other systems.
> 
> We are unsure how to satisfy those requirements in Kafka. If anyone has ideas or suggestions we would be very interested in your opinion. We are also interested in general about experiences in implementing GDPR compliance in Kafka, especially when dealing with multiple, interconnected systems.
> 
> Kind regards,
> 
> -- 
> Christian Apolloni
> 
> Disclaimer: The contents of this email and any attachment thereto are intended exclusively for the attention of the addressee(s). The email and any such attachment(s) may contain information that is confidential and protected on the strength of professional, official or business secrecy laws and regulations or contractual obligations. Should you have received this email by mistake, you may neither make use of nor divulge the contents of the email or of any attachment thereto. In such a case, please inform the email's sender and delete the message and all attachments without delay from your systems.
> You can find our e-mail disclaimer statement in other languages under http://www.baloise.ch/email_disclaimer

Re: GDPR compliance

Posted by "Apolloni, Christian" <ch...@baloise.ch>.
> Hi all,> 
> 
> there has been an interesting talk about this during a previous Kafka> 
> Summit. It talks about using crypto-shredding to 'forget' user information.> 
> I'm not sure if there are any slides, but it basically suggests that you'd> 
> encrypt user data on Kafka, and when you get a information removal request,> 
> the only thing you have to do is to delete the encryption key for that user.> 
> 
> Here's the announcement of the talk:> 
> https://kafka-summit.org/sessions/handling-gdpr-apache-kafka-comply-without-freaking/,> 
> but not sure where slides or a recording can be found unfortunately.> 
> 
> Hope it helps.>
> 
> BR,> 
> Patrick>

Hi Patrick,

Thanks for your reply, we are aware of that talk: the documentation is avaliable here:

https://www.confluent.io/kafka-summit-lon19/handling-gdpr-apache-kafka-comply-freaking-out/

That's what sparked our interest in such a solution.

Kind regards,
     
 -- 
 Christian Apolloni
Disclaimer: The contents of this email and any attachment thereto are intended exclusively for the attention of the addressee(s). The email and any such attachment(s) may contain information that is confidential and protected on the strength of professional, official or business secrecy laws and regulations or contractual obligations. Should you have received this email by mistake, you may neither make use of nor divulge the contents of the email or of any attachment thereto. In such a case, please inform the email's sender and delete the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under http://www.baloise.ch/email_disclaimer

Re: GDPR compliance

Posted by Christopher Smith <cb...@gmail.com>.
Yup. The crypto-shredding approach tends to be the most practical.
Basically do payload encryption of your PI and with a unique per-user key.
Throw away the per user key, and the data is "deleted" from a CCPA
perspective.

The alternative is to have the relevant topic have tight retention SLAs,
which often proves to be counter productive.

--Chris

On Wed, Aug 19, 2020 at 11:31 AM Patrick Plaatje <pp...@gmail.com> wrote:

> Hi all,
>
> there has been an interesting talk about this during a previous Kafka
> Summit. It talks about using crypto-shredding to 'forget' user information.
> I'm not sure if there are any slides, but it basically suggests that you'd
> encrypt user data on Kafka, and when you get a information removal request,
> the only thing you have to do is to delete the encryption key for that
> user.
>
> Here's the announcement of the talk:
>
> https://kafka-summit.org/sessions/handling-gdpr-apache-kafka-comply-without-freaking/
> ,
> but not sure where slides or a recording can be found unfortunately.
>
> Hope it helps.
>
> BR,
> Patrick
>
> On Wed, 19 Aug 2020 at 18:16, Nemeth Sandor <sa...@gmail.com>
> wrote:
>
> > Hi Christian,
> >
> > depending on how your Kafka topics are configured, you have 2 different
> > options:
> >
> > a) if you have a non-log-compacted then you can set the message retention
> > on the topic to the desired value. In that case the message will be
> deleted
> > by Kafka after the retention period expires. (the config value is `
> > retention.ms` I think)
> >
> > b) if you use Kafka as a log store with topics having infinite retention,
> > then one common solution is to send a so-called tombstone record (a
> record
> > with the same key containing only GDPR compatible data with the sensitive
> > information removed), and let Kafka take care of the removal using log
> > compaction.
> >
> > Kind regards,
> > Sandor
> >
> >
> > On Wed, 19 Aug 2020 at 16:53, Apolloni, Christian <
> > christian.apolloni@baloise.ch> wrote:
> >
> > > Hello,
> > >
> > > I have some questions about implementing GDPR compliance in Kafka.
> > >
> > > In our situation we have the requirement of removing personal data from
> > in
> > > coordination with multiple systems. The idea is having a central
> > > "coordinator system" which triggers the deletion process for the
> > individual
> > > systems in a specific, controlled sequence which takes into account the
> > > various system inter-dependencies and data flows. This means e.g.
> system
> > > nr. 2 will receive the delete order only after system nr. 1 has
> reported
> > > that it's done with the deletion on its side (and so forth).
> > >
> > > One of the systems in question publishes data in Kafka topics for
> > > consumption in other systems and part of the deletion process is to
> > remove
> > > the relevant personal data from these Kafka topics too. This has to
> > happen
> > > in a relatively short time after the deletion order is received, to
> > prevent
> > > a long delay before the systems further down the chain can start their
> > own
> > > deletion. Furthermore, we need to know when the operation is completed:
> > > only at that point we can give the "go" to the other systems.
> > >
> > > We are unsure how to satisfy those requirements in Kafka. If anyone has
> > > ideas or suggestions we would be very interested in your opinion. We
> are
> > > also interested in general about experiences in implementing GDPR
> > > compliance in Kafka, especially when dealing with multiple,
> > interconnected
> > > systems.
> > >
> > > Kind regards,
> > >
> > > --
> > > Christian Apolloni
> > >
> > > Disclaimer: The contents of this email and any attachment thereto are
> > > intended exclusively for the attention of the addressee(s). The email
> and
> > > any such attachment(s) may contain information that is confidential and
> > > protected on the strength of professional, official or business secrecy
> > > laws and regulations or contractual obligations. Should you have
> received
> > > this email by mistake, you may neither make use of nor divulge the
> > contents
> > > of the email or of any attachment thereto. In such a case, please
> inform
> > > the email's sender and delete the message and all attachments without
> > delay
> > > from your systems.
> > > You can find our e-mail disclaimer statement in other languages under
> > > http://www.baloise.ch/email_disclaimer
> > >
> >
>
>
> --
> Patrick Plaatje
>


-- 
Chris

Re: GDPR compliance

Posted by Patrick Plaatje <pp...@gmail.com>.
Hi all,

there has been an interesting talk about this during a previous Kafka
Summit. It talks about using crypto-shredding to 'forget' user information.
I'm not sure if there are any slides, but it basically suggests that you'd
encrypt user data on Kafka, and when you get a information removal request,
the only thing you have to do is to delete the encryption key for that user.

Here's the announcement of the talk:
https://kafka-summit.org/sessions/handling-gdpr-apache-kafka-comply-without-freaking/,
but not sure where slides or a recording can be found unfortunately.

Hope it helps.

BR,
Patrick

On Wed, 19 Aug 2020 at 18:16, Nemeth Sandor <sa...@gmail.com>
wrote:

> Hi Christian,
>
> depending on how your Kafka topics are configured, you have 2 different
> options:
>
> a) if you have a non-log-compacted then you can set the message retention
> on the topic to the desired value. In that case the message will be deleted
> by Kafka after the retention period expires. (the config value is `
> retention.ms` I think)
>
> b) if you use Kafka as a log store with topics having infinite retention,
> then one common solution is to send a so-called tombstone record (a record
> with the same key containing only GDPR compatible data with the sensitive
> information removed), and let Kafka take care of the removal using log
> compaction.
>
> Kind regards,
> Sandor
>
>
> On Wed, 19 Aug 2020 at 16:53, Apolloni, Christian <
> christian.apolloni@baloise.ch> wrote:
>
> > Hello,
> >
> > I have some questions about implementing GDPR compliance in Kafka.
> >
> > In our situation we have the requirement of removing personal data from
> in
> > coordination with multiple systems. The idea is having a central
> > "coordinator system" which triggers the deletion process for the
> individual
> > systems in a specific, controlled sequence which takes into account the
> > various system inter-dependencies and data flows. This means e.g. system
> > nr. 2 will receive the delete order only after system nr. 1 has reported
> > that it's done with the deletion on its side (and so forth).
> >
> > One of the systems in question publishes data in Kafka topics for
> > consumption in other systems and part of the deletion process is to
> remove
> > the relevant personal data from these Kafka topics too. This has to
> happen
> > in a relatively short time after the deletion order is received, to
> prevent
> > a long delay before the systems further down the chain can start their
> own
> > deletion. Furthermore, we need to know when the operation is completed:
> > only at that point we can give the "go" to the other systems.
> >
> > We are unsure how to satisfy those requirements in Kafka. If anyone has
> > ideas or suggestions we would be very interested in your opinion. We are
> > also interested in general about experiences in implementing GDPR
> > compliance in Kafka, especially when dealing with multiple,
> interconnected
> > systems.
> >
> > Kind regards,
> >
> > --
> > Christian Apolloni
> >
> > Disclaimer: The contents of this email and any attachment thereto are
> > intended exclusively for the attention of the addressee(s). The email and
> > any such attachment(s) may contain information that is confidential and
> > protected on the strength of professional, official or business secrecy
> > laws and regulations or contractual obligations. Should you have received
> > this email by mistake, you may neither make use of nor divulge the
> contents
> > of the email or of any attachment thereto. In such a case, please inform
> > the email's sender and delete the message and all attachments without
> delay
> > from your systems.
> > You can find our e-mail disclaimer statement in other languages under
> > http://www.baloise.ch/email_disclaimer
> >
>


-- 
Patrick Plaatje

Re: GDPR compliance

Posted by Nemeth Sandor <sa...@gmail.com>.
Hi Christian,

depending on how your Kafka topics are configured, you have 2 different
options:

a) if you have a non-log-compacted then you can set the message retention
on the topic to the desired value. In that case the message will be deleted
by Kafka after the retention period expires. (the config value is `
retention.ms` I think)

b) if you use Kafka as a log store with topics having infinite retention,
then one common solution is to send a so-called tombstone record (a record
with the same key containing only GDPR compatible data with the sensitive
information removed), and let Kafka take care of the removal using log
compaction.

Kind regards,
Sandor


On Wed, 19 Aug 2020 at 16:53, Apolloni, Christian <
christian.apolloni@baloise.ch> wrote:

> Hello,
>
> I have some questions about implementing GDPR compliance in Kafka.
>
> In our situation we have the requirement of removing personal data from in
> coordination with multiple systems. The idea is having a central
> "coordinator system" which triggers the deletion process for the individual
> systems in a specific, controlled sequence which takes into account the
> various system inter-dependencies and data flows. This means e.g. system
> nr. 2 will receive the delete order only after system nr. 1 has reported
> that it's done with the deletion on its side (and so forth).
>
> One of the systems in question publishes data in Kafka topics for
> consumption in other systems and part of the deletion process is to remove
> the relevant personal data from these Kafka topics too. This has to happen
> in a relatively short time after the deletion order is received, to prevent
> a long delay before the systems further down the chain can start their own
> deletion. Furthermore, we need to know when the operation is completed:
> only at that point we can give the "go" to the other systems.
>
> We are unsure how to satisfy those requirements in Kafka. If anyone has
> ideas or suggestions we would be very interested in your opinion. We are
> also interested in general about experiences in implementing GDPR
> compliance in Kafka, especially when dealing with multiple, interconnected
> systems.
>
> Kind regards,
>
> --
> Christian Apolloni
>
> Disclaimer: The contents of this email and any attachment thereto are
> intended exclusively for the attention of the addressee(s). The email and
> any such attachment(s) may contain information that is confidential and
> protected on the strength of professional, official or business secrecy
> laws and regulations or contractual obligations. Should you have received
> this email by mistake, you may neither make use of nor divulge the contents
> of the email or of any attachment thereto. In such a case, please inform
> the email's sender and delete the message and all attachments without delay
> from your systems.
> You can find our e-mail disclaimer statement in other languages under
> http://www.baloise.ch/email_disclaimer
>