You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Artur Khanin <ar...@akvelon.com> on 2020/11/23 17:19:41 UTC

Proposal: Beam Template-like Example to protect sensitive data

Hi Community!

Some users may want to protect their sensitive data using tokenization.
We propose to create a Beam example template that will demonstrate Beam transform to protect sensitive data using tokenization. In our example, we will use an external service for the data tokenization.

At a high level, a pipeline that will:

  *   support batch (GCS) and streaming (Pub/Sub) input sources
  *   tokenize sensitive data via external REST service - we are about to use Protegrity
  *   output tokenized data into BigQuery or BigTable


I created JIRA ticket BEAM-11322<https://issues.apache.org/jira/browse/BEAM-11322> to describe this proposal and capture feedback. More details and the proposed design are available in the design doc<https://docs.google.com/document/d/1fnsUfGpCx8A_MBchPRvlm4gU0Ai5EQNSiZS1mg_A_zg/edit?usp=sharing>.

I welcome community feedback and comments regarding this Beam data tokenization template proposal

Thanks,
Artur Khanin
Akvelon, Inc