You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Rajesh Mahindra (Jira)" <ji...@apache.org> on 2022/07/26 00:08:00 UTC

[jira] [Commented] (HUDI-4459) Corrupt parquet file created when syncing huge table with 4000+ fields,using hudi cow table with bulk_insert type

    [ https://issues.apache.org/jira/browse/HUDI-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571140#comment-17571140 ] 

Rajesh Mahindra commented on HUDI-4459:
---------------------------------------

[~danny0405] can you help assign this ticket?

> Corrupt parquet file created when syncing huge table with 4000+ fields,using hudi cow table with bulk_insert type
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4459
>                 URL: https://issues.apache.org/jira/browse/HUDI-4459
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Leo zhang
>            Assignee: Danny Chen
>            Priority: Major
>         Attachments: statements.sql, table.ddl
>
>
> I am trying to sync a huge table with 4000+ fields into hudi, using cow table with bulk_insert  operate type.
> The job can finished without any exception,but when I am trying to read data from the table,I get empty result.The parquet file is corrupted, can't be read correctly. 
> I had tried to  trace the problem, and found it was coused by SortOperator. After the record is serialized in the sorter, all the field get disorder and is deserialized into one field.And finally the wrong record is written into parquet file,and make the file unreadable.
> Here's a few step to reproduce the bug ine the flink sql-client:
> 1、execute the table ddl(provided in the table.ddl file  in the attachments)
> 2、execute the insert statement (provided in the statement.sql file  in the attachments)
> 3、execute a select statement to query hudi table  (provided in the statement.sql file  in the attachments)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)