You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "liujinhui (Jira)" <ji...@apache.org> on 2021/08/30 09:33:00 UTC

[jira] [Commented] (HUDI-2370) Supports data encryption

    [ https://issues.apache.org/jira/browse/HUDI-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17406650#comment-17406650 ] 

liujinhui commented on HUDI-2370:
---------------------------------

{code:java}
public static void writeParquet(SparkSession spark, JavaSparkContext jsc) throws IOException {

  jsc.hadoopConfiguration().set("parquet.crypto.factory.class" ,
      "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory");
  jsc.hadoopConfiguration().set("parquet.encryption.kms.client.class",
      "org.apache.hudi.parquet.InMemoryKMS");
  jsc.hadoopConfiguration().set("parquet.encryption.key.list" ,
      "k1:AAECAwQFBgcICQoLDA0ODw== , k2:AAECAAECAAECAAECAAECAA==");
  jsc.hadoopConfiguration().set("parquet.encryption.footer.key" , "k1");
  jsc.hadoopConfiguration().set("parquet.encryption.column.keys" , "k2:rider");

  QuickstartUtils.DataGenerator dataGen = new QuickstartUtils.DataGenerator();
  List<String> inserts = convertToStringList(dataGen.generateInserts(1));
  Dataset<Row> dataset = spark.read().json(jsc.parallelize(inserts, 1));
  dataset.write().format("org.apache.hudi")
      .options(getQuickstartWriteConfigs())
      .option("hoodie.table.name", "test123")
      .option("hoodie.datasource.write.operation", "upsert")
      .option("hoodie.datasource.write.recordkey.field", "uuid")
      .option("hoodie.datasource.write.storage.type", "COPY_ON_WRITE")
      .option("hoodie.datasource.write.precombine.field", "ts")
      .option("hoodie.combine.before.upsert","false")
      .option("hoodie.embed.timeline.server", false)
      .mode(Append)
      .save("hdfs://127.0.0.1:9000/hudi/test/01");
}{code}

> Supports data encryption
> ------------------------
>
>                 Key: HUDI-2370
>                 URL: https://issues.apache.org/jira/browse/HUDI-2370
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: liujinhui
>            Assignee: liujinhui
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Data security is becoming more and more important, if hudi can support encryption, it is very welcome
> 1. Specify column encryption
>  2. Support footer encryption
>  3. Custom encrypted client interface(Provide memory-based encryption client by default)
> 4. Specify the encryption key
>  
> When querying, you need to pass the relevant key or obtain query permission based on the client's encrypted interface. If it fails, the result cannot be returned.
>  1. When querying non-encrypted fields, the key is not passed, and the data is returned normally
>  2. When querying encrypted fields, the key is not passed and the data is not returned
>  3. When the encrypted field is queried, the key is passed, and the data is returned normally
>  4. When querying all fields, the key is not passed and no result is returned. If passed, the data returns normally
>  
> Start with COW first



--
This message was sent by Atlassian Jira
(v8.3.4#803005)