You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/18 21:02:27 UTC

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1248: Adding delete docs to QuickStart

nsivabalan commented on a change in pull request #1248: Adding delete docs to QuickStart
URL: https://github.com/apache/incubator-hudi/pull/1248#discussion_r368248364
 
 

 ##########
 File path: docs/quickstart.md
 ##########
 @@ -109,6 +109,57 @@ Notice that the save mode is now `Append`. In general, always use append mode un
 [Querying](#query) the data again will now show updated trips. Each write operation generates a new [commit](http://hudi.incubator.apache.org/concepts.html) 
 denoted by the timestamp. Look for changes in `_hoodie_commit_time`, `rider`, `driver` fields for the same `_hoodie_record_key`s in previous commit. 
 
+## Delete data {#deletes}
+Delete records for the HoodieKeys passed in. Lets first generate a new batch of insert and delete the same. Query to verify
+that all records are deleted.
+
+```
+val inserts = convertToStringList(dataGen.generateInserts(10))
+val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
+df.write.format("org.apache.hudi").
+    options(getQuickstartWriteConfigs).
+    option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+    option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+    option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+    option(TABLE_NAME, tableName).
+    mode(Overwrite).
+    save(basePath);
+
+// Fetch the rider value for the batch of records inserted just now
+val roDeleteViewDF = spark.
+    read.
+    format("org.apache.hudi").
+    load(basePath + "/*/*/*/*")
+roDeleteViewDF.registerTempTable("hudi_ro_table")
+spark.sql("select distinct rider from  hudi_ro_table where").show()
+
+// replace the rider value in below query to a value from above. "rider-213" is first batch and "rider-284" is second batch.
+val ds = spark.sql("select uuid, partitionPath from hudi_ro_table where rider = 'rider-284'")
+
+// issue deletes
 
 Review comment:
   I am deleting an entire batch of inserts and hence thought will do a new batch of inserts and delete the entire batch. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services