You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/15 03:01:09 UTC
[GitHub] [hudi] xccui opened a new issue, #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
xccui opened a new issue, #5870:
URL: https://github.com/apache/hudi/issues/5870
**Describe the problem you faced**
Hello! We use Flink to write some PostgresCDC data to a Hudi table. The table is partitioned by a custom date format (as shown below).
```
setString(FlinkOptions.PARTITION_PATH_FIELD, "timestamp_field")
setString(FlinkOptions.KEYGEN_TYPE, KeyGeneratorType.TIMESTAMP.toString())
setString(KeyGeneratorOptions.Config.TIMESTAMP_TYPE_FIELD_PROP, "EPOCHMILLISECONDS")
setString(KeyGeneratorOptions.Config.TIMESTAMP_OUTPUT_DATE_FORMAT_PROP, "yyyy")
setString(KeyGeneratorOptions.Config.TIMESTAMP_TIMEZONE_FORMAT_PROP, "UTC+0:00")
```
When querying the table with Flink SQL, we got the `DateTimeParseException` exception.
After some tests, we found that the table can be successfully queried with this extra option `'hoodie.datasource.write.partitionpath.field'=''`.
**Environment Description**
* Hudi version : 0.12.0-SNAPSHOT on master
* Flink version : 1.14
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
Not sure if it's relevant, but the default values are different for the option `hoodie.datasource.write.partitionpath.field` in `FlinkOptions.java` and `KeyGeneratorOptions.java`.
**Stacktrace**
```[2022-06-14 22:40:53] Caused by: java.time.format.DateTimeParseException: Text '2022' could not be parsed at index 4
[2022-06-14 22:40:53] at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
[2022-06-14 22:40:53] at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
[2022-06-14 22:40:53] at java.time.LocalDateTime.parse(LocalDateTime.java:492)
[2022-06-14 22:40:53] at java.time.LocalDateTime.parse(LocalDateTime.java:477)
[2022-06-14 22:40:53] at org.apache.flink.table.filesystem.RowPartitionComputer.restorePartValueFromType(RowPartitionComputer.java:122)
[2022-06-14 22:40:53] at org.apache.flink.table.filesystem.RowPartitionComputer.restorePartValueFromType(RowPartitionComputer.java:84)
[2022-06-14 22:40:53] at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.lambda$getReader$0(MergeOnReadInputFormat.java:302)
[2022-06-14 22:40:53] at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
[2022-06-14 22:40:53] at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getReader(MergeOnReadInputFormat.java:302)
[2022-06-14 22:40:53] at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.getFullSchemaReader(MergeOnReadInputFormat.java:288)
[2022-06-14 22:40:53] at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:205)
[2022-06-14 22:40:53] at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.open(MergeOnReadInputFormat.java:81)
[2022-06-14 22:40:53] at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
[2022-06-14 22:40:53] at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
[2022-06-14 22:40:53] at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
[2022-06-14 22:40:53] at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] emtwo commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
Posted by GitBox <gi...@apache.org>.
emtwo commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1195974259
Still seeing this issue. Wondering if perhaps there are additional configs that should be set to achieve this partitioning?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] xccui commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
Posted by GitBox <gi...@apache.org>.
xccui commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1251447373
Hi @danny0405, I wonder if you could take a look at this issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1306549833
0.12.1 should have solved this problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
Posted by GitBox <gi...@apache.org>.
codope commented on issue #5870:
URL: https://github.com/apache/hudi/issues/5870#issuecomment-1156263611
@yuzhaojing Can you please look into this issue?
I don't think different defaults in `FlinkOptions` and `KeyGeneratorOptions` matter as they have had different defaults in the past as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 closed issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
Posted by GitBox <gi...@apache.org>.
danny0405 closed issue #5870: [SUPPORT] Issues when querying data partitioned by year with Flink
URL: https://github.com/apache/hudi/issues/5870
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org