You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/01/12 18:22:00 UTC

[jira] [Updated] (HUDI-5545) Extending support to other special characters for S3EventsMetaSelector

     [ https://issues.apache.org/jira/browse/HUDI-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-5545:
----------------------------
    Description: 
This fix is to cover issue as follows.

I am working on ingestion with S3 as source by following this [blog|https://hudi.apache.org/blog/2021/08/23/s3-events-source/] . But 2nd job(S3EventsHoodieIncrSource) failing with
{{{}HoodieException: org.apache.hudi.exception.HoodieException: Path does not exist{}}}. In our investigation, we have observed job failing due to encoded characters( these are being added by SQS) in S3 object name.
When we deep dive in Hudi source code , we have observed Hudi decoding them in [S3EventsMetaSelector|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/S3EventsMetaSelector.java#L154] & at the movement only = have handled.
FYI-
Original S3 object : {{s3://<bucket>/s3_parquet_source_data/s3-test+0+0000061344.parquet}}
Encoded S3 object: {{s3://<bucket>/s3_parquet_source_data/s3-test%2B0%2B0000061344.parquet}}
Note: workflow was running successfully if file name corrected.

> Extending support to other special characters for S3EventsMetaSelector
> ----------------------------------------------------------------------
>
>                 Key: HUDI-5545
>                 URL: https://issues.apache.org/jira/browse/HUDI-5545
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> This fix is to cover issue as follows.
> I am working on ingestion with S3 as source by following this [blog|https://hudi.apache.org/blog/2021/08/23/s3-events-source/] . But 2nd job(S3EventsHoodieIncrSource) failing with
> {{{}HoodieException: org.apache.hudi.exception.HoodieException: Path does not exist{}}}. In our investigation, we have observed job failing due to encoded characters( these are being added by SQS) in S3 object name.
> When we deep dive in Hudi source code , we have observed Hudi decoding them in [S3EventsMetaSelector|https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/S3EventsMetaSelector.java#L154] & at the movement only = have handled.
> FYI-
> Original S3 object : {{s3://<bucket>/s3_parquet_source_data/s3-test+0+0000061344.parquet}}
> Encoded S3 object: {{s3://<bucket>/s3_parquet_source_data/s3-test%2B0%2B0000061344.parquet}}
> Note: workflow was running successfully if file name corrected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)