You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/15 03:01:09 UTC

[GitHub] [hudi] xccui opened a new issue, #5870: [SUPPORT] Issues when querying data partitioned by year with Flink

xccui opened a new issue, #5870:
URL: https://github.com/apache/hudi/issues/5870

   **Describe the problem you faced**
   
   Hello! We use Flink to write some PostgresCDC data to a Hudi table. The table is partitioned by a custom date format (as shown below). 
   ```
   setString(FlinkOptions.PARTITION_PATH_FIELD, "timestamp_field")
   setString(FlinkOptions.KEYGEN_TYPE, KeyGeneratorType.TIMESTAMP.toString())
   setString(KeyGeneratorOptions.Config.TIMESTAMP_TYPE_FIELD_PROP, "EPOCHMILLISECONDS")
   setString(KeyGeneratorOptions.Config.TIMESTAMP_OUTPUT_DATE_FORMAT_PROP, "yyyy")
   setString(KeyGeneratorOptions.Config.TIMESTAMP_TIMEZONE_FORMAT_PROP, "UTC+0:00")
   ```
   When querying the table with Flink SQL, we got the `DateTimeParseException` exception.
   
   After some tests, we found that the table can be successfully queried with this extra option `'hoodie.datasource.write.partitionpath.field'=''`.
   
   **Environment Description**
   
   * Hudi version : 0.12.0-SNAPSHOT on master
   
   * Flink version : 1.14
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   **Additional context**
   
   Not sure if it's relevant, but the default values are different for the option `hoodie.datasource.write.partitionpath.field` in `FlinkOptions.java` and `KeyGeneratorOptions.java`.
   
   **Stacktrace**
   
   ```[2022-06-14 22:40:53] Caused by: java.time.format.DateTimeParseException: Text '2022' could not be parsed at index 4
   [2022-06-14 22:40:53] 	at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
   [2022-06-14 22:40:53] 	at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
   [2022-06-14 22:40:53] 	at java.time.LocalDateTime.parse(LocalDateTime.java:492)
   [2022-06-14 22:40:53] 	at java.time.LocalDateTime.parse(LocalDateTime.java:477)
   [2022-06-14 22:40:53] 	at org.apache.flink.table.filesystem.RowPartitionComputer.restorePartValueFromType(RowPartitionComputer.java:122)
   [2022-06-14 22:40:53] 	at org.apache.flink.table.filesystem.RowPartitionComputer.restorePartValueFromType(RowPartitionComputer.java:84)
   [2022-06-14 22:40:53] 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.lambda$getReader$0(MergeOnReadInputFormat.java:302)
   [2022-06-14 22:40:53] 	at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
   [2022-06-14 22:40:53] 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getReader(MergeOnReadInputFormat.java:302)
   [2022-06-14 22:40:53] 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getFullSchemaReader(MergeOnReadInputFormat.java:288)
   [2022-06-14 22:40:53] 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:205)
   [2022-06-14 22:40:53] 	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:81)
   [2022-06-14 22:40:53] 	at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
   [2022-06-14 22:40:53] 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
   [2022-06-14 22:40:53] 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
   [2022-06-14 22:40:53] 	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] emtwo commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink

Posted by GitBox <gi...@apache.org>.
emtwo commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1195974259

   Still seeing this issue. Wondering if perhaps there are additional configs that should be set to achieve this partitioning? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xccui commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink

Posted by GitBox <gi...@apache.org>.
xccui commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1251447373

   Hi @danny0405, I wonder if you could take a look at this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1306549833

   0.12.1 should have solved this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1156263611

   @yuzhaojing Can you please look into this issue?
   
   I don't think different defaults in `FlinkOptions` and `KeyGeneratorOptions` matter as they have had different defaults in the past as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink

Posted by GitBox <gi...@apache.org>.
danny0405 closed issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
URL: https://github.com/apache/hudi/issues/5870


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org