You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/03 16:30:17 UTC
[GitHub] [hudi] chrischnweiss opened a new issue #4946: [SUPPORT] TimestampBasedKeyGenerator failed to parse date_string column
chrischnweiss opened a new issue #4946:
URL: https://github.com/apache/hudi/issues/4946
**Describe the problem you faced**
Hey guys,
actually I am trying to use DeltaStreamer along with a CustomKeyGenerator to use ComplexKeyGenerator and TimeBasedKeyGenerator together. Now I am struggling with DateFormat Exception and I canĀ“t figure out why. Maybe you can help me?
These are my config properties for the partitoning:
```
hoodie.datasource.write.recordkey.field=id_1,id_2
hoodie.datasource.write.partitionpath.field=timestamp_col:timestamp
hoodie.datasource.write.hive_style_partitioning=true
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING
hoodie.deltastreamer.keygen.timebased.input.dateformat="yyyy-MM-dd HH:mm:ss.SSS Z"
hoodie.deltastreamer.keygen.timebased.output.dateformat="yyyy/MM/dd"
```
**Expected behavior**
I want to parse my date_string as input to partition the target dataset.
My timestamp_col looks like this: `2022-01-13 16:57:05.659 +01:00`
**Environment Description**
Hudi version :
0.10.0
Spark version :
3.1.1
Hive version :
3.1.3000
Hadoop version :
3.1.1
Storage (HDFS/S3/GCS..) :
HDFS
Running on Docker? (yes/no) :
no
**Stacktrace**
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11) (t-worker.node.asdasd.de executor 1): org.apache.hudi.exception.HoodieKeyGeneratorException: Unable to parse input partition field :2022-01-13 16:57:05.659 +01:00
at org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:135)
at org.apache.hudi.keygen.CustomAvroKeyGenerator.getPartitionPath(CustomAvroKeyGenerator.java:89)
at org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:68)
at org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:62)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$d62e16$1(DeltaSync.java:453)
at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1440)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Invalid format: "2022-01-13 16:57:05.659 +01:00"
at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:945)
at org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:202)
at org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:133)
... 30 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4946: [SUPPORT] TimestampBasedKeyGenerator failed to parse date_string column
Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4946:
URL: https://github.com/apache/hudi/issues/4946#issuecomment-1059916802
@chrischnweiss have you tried verifying the input datetime string works with the datetime format you set?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org