You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/03/03 18:44:49 UTC
[GitHub] [hudi] nsivabalan commented on issue #8031: [SUPPORT] Hudi Timestamp Based Key Generator Need Assistance
nsivabalan commented on issue #8031:
URL: https://github.com/apache/hudi/issues/8031#issuecomment-1453950617
```
import java.sql.Timestamp
import spark.implicits._
val df = Seq(
(1, Timestamp.valueOf("2014-01-01 23:00:01"), "abc"),
(1, Timestamp.valueOf("2014-11-30 12:40:32"), "abc"),
(2, Timestamp.valueOf("2016-12-29 09:54:00"), "def"),
(2, Timestamp.valueOf("2016-05-09 10:12:43"), "def")
).toDF("typeId","eventTime", "str")
import org.apache.hudi.QuickstartUtils._
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.common.model.HoodieRecord
df.write.format("hudi").
option("hoodie.insert.shuffle.parallelism", "2").
option("hoodie.upsert.shuffle.parallelism", "2").
option("hoodie.datasource.write.precombine.field", "typeId").
option("hoodie.datasource.write.partitionpath.field", "eventTime").
option("hoodie.datasource.write.recordkey.field", "str").
option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.keygen.TimestampBasedKeyGenerator").
option("hoodie.deltastreamer.keygen.timebased.timestamp.type","DATE_STRING").
option("hoodie.deltastreamer.keygen.timebased.timezone","GMT+8:00").
option("hoodie.deltastreamer.keygen.timebased.input.dateformat","yyyy-MM-dd hh:mm:ss").
option("hoodie.deltastreamer.keygen.timebased.output.dateformat","yyyy-MM-dd").
option("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true").
option("hoodie.table.name", "hudi_tbl").
mode(Overwrite).
save("/tmp/hudi_tbl_trial/")
```
ls of base path
```
ls -ltr /tmp/hudi_tbl_trial/
total 0
drwxr-xr-x 6 nsb wheel 192 Mar 3 10:40 2016-12-30
drwxr-xr-x 6 nsb wheel 192 Mar 3 10:40 2014-01-02
drwxr-xr-x 6 nsb wheel 192 Mar 3 10:40 2016-05-10
drwxr-xr-x 6 nsb wheel 192 Mar 3 10:40 2014-12-01
```
If you prefer slash encoded
```
option("hoodie.deltastreamer.keygen.timebased.output.dateformat","yyyy/MM/dd")
```
but dir will be 3 level deep
```
ls -ltr /tmp/hudi_tbl_trial/
total 0
drwxr-xr-x 4 nsb wheel 128 Mar 3 10:42 2014
drwxr-xr-x 4 nsb wheel 128 Mar 3 10:42 2016
nsb$ ls -ltr /tmp/hudi_tbl_trial/2014/
total 0
drwxr-xr-x 3 nsb wheel 96 Mar 3 10:42 01
drwxr-xr-x 3 nsb wheel 96 Mar 3 10:42 12
nsb$ ls -ltr /tmp/hudi_tbl_trial/2014/01/
total 0
drwxr-xr-x 6 nsb wheel 192 Mar 3 10:42 02
nsb$ ls -ltr /tmp/hudi_tbl_trial/2014/01/02/
total 856
-rw-r--r-- 1 nsb wheel 434759 Mar 3 10:42 b02e5e6f-9d28-42d1-b257-3728e534d477-0_3-49-76_20230303104246958.parquet
```
Guess you were missing option("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled","true").
https://hudi.apache.org/docs/configurations/#hoodiedatasourcewritekeygeneratorconsistentlogicaltimestampenabled-1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org