You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/03/03 18:44:49 UTC

[GitHub] [hudi] nsivabalan commented on issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance

nsivabalan commented on issue #8031:
URL: https://github.com/apache/hudi/issues/8031#issuecomment-1453950617

   ```
   
   import java.sql.Timestamp
   import spark.implicits._
   
   val df = Seq(
     (1, Timestamp.valueOf("2014-01-01 23:00:01"), "abc"),
     (1, Timestamp.valueOf("2014-11-30 12:40:32"), "abc"),
     (2, Timestamp.valueOf("2016-12-29 09:54:00"), "def"),
     (2, Timestamp.valueOf("2016-05-09 10:12:43"), "def")
   ).toDF("typeId","eventTime", "str")
   
   
   import org.apache.hudi.QuickstartUtils._
   import scala.collection.JavaConversions._
   import org.apache.spark.sql.SaveMode._
   import org.apache.hudi.DataSourceReadOptions._
   import org.apache.hudi.DataSourceWriteOptions._
   import org.apache.hudi.config.HoodieWriteConfig._
   import org.apache.hudi.common.model.HoodieRecord
   
   
   
   df.write.format("hudi").
   option("hoodie.insert.shuffle.parallelism", "2").
   option("hoodie.upsert.shuffle.parallelism", "2").
     option("hoodie.datasource.write.precombine.field", "typeId").
     option("hoodie.datasource.write.partitionpath.field", "eventTime").
     option("hoodie.datasource.write.recordkey.field", "str").
     option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.TimestampBasedKeyGenerator").
     option("hoodie.deltastreamer.keygen.timebased.timestamp.type","DATE_STRING").
     option("hoodie.deltastreamer.keygen.timebased.timezone","GMT+8:00").
     option("hoodie.deltastreamer.keygen.timebased.input.dateformat","yyyy-MM-dd hh:mm:ss").
     option("hoodie.deltastreamer.keygen.timebased.output.dateformat","yyyy-MM-dd").
     option("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true").
     option("hoodie.table.name", "hudi_tbl").
     mode(Overwrite).
     save("/tmp/hudi_tbl_trial/")
   
   ```
   
   ls of base path 
   ```
   ls -ltr /tmp/hudi_tbl_trial/
   total 0
   drwxr-xr-x  6 nsb  wheel  192 Mar  3 10:40 2016-12-30
   drwxr-xr-x  6 nsb  wheel  192 Mar  3 10:40 2014-01-02
   drwxr-xr-x  6 nsb  wheel  192 Mar  3 10:40 2016-05-10
   drwxr-xr-x  6 nsb  wheel  192 Mar  3 10:40 2014-12-01
   ```
   
   
   
   If you prefer slash encoded 
   ```
   option("hoodie.deltastreamer.keygen.timebased.output.dateformat","yyyy/MM/dd")
   ```
   
   but dir will be 3 level deep
   ```
   ls -ltr /tmp/hudi_tbl_trial/
   total 0
   drwxr-xr-x  4 nsb  wheel  128 Mar  3 10:42 2014
   drwxr-xr-x  4 nsb  wheel  128 Mar  3 10:42 2016
   nsb$ ls -ltr /tmp/hudi_tbl_trial/2014/
   total 0
   drwxr-xr-x  3 nsb  wheel  96 Mar  3 10:42 01
   drwxr-xr-x  3 nsb  wheel  96 Mar  3 10:42 12
   nsb$ ls -ltr /tmp/hudi_tbl_trial/2014/01/
   total 0
   drwxr-xr-x  6 nsb  wheel  192 Mar  3 10:42 02
   nsb$ ls -ltr /tmp/hudi_tbl_trial/2014/01/02/
   total 856
   -rw-r--r--  1 nsb  wheel  434759 Mar  3 10:42 b02e5e6f-9d28-42d1-b257-3728e534d477-0_3-49-76_20230303104246958.parquet
   ```
   
   
   Guess you were missing   option("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true").
   https://hudi.apache.org/docs/configurations/#hoodiedatasourcewritekeygeneratorconsistentlogicaltimestampenabled-1
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org