You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "任建亭 (Jira)" <ji...@apache.org> on 2021/10/13 12:42:00 UTC

[jira] [Comment Edited] (ORC-1028) Orc file damage detection

    [ https://issues.apache.org/jira/browse/ORC-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428186#comment-17428186 ] 

任建亭 edited comment on ORC-1028 at 10/13/21, 12:41 PM:
------------------------------------------------------

We used the Hadoop distcp tool to copy the data to the EC directory. During this process, some ORC files were abnormal because checksum (COMPOSITE_CRC) was not performed on the data.
 We used OrcFiledump to check the damaged files, and reported many types of error exceptions, mainly as follows:

(1) java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 1778358
(2) com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.

(3) com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type

(4) java.io.IOException: Bad compression data


was (Author: renjianting001):
We used the Hadoop distcp tool to copy the data to the EC directory. During this process, some ORC files were abnormal because checksum (COMPOSITE_CRC) was not performed on the data.
We used OrcFiledump to check the damaged files, and reported many types of error exceptions, mainly as follows:
com.google.protobuf.InvalidProtocolBufferException:
java.io.EOFException:
java.io.IOException:
java.lang.IllegalArgumentException:
java.lang.IllegalStateException:
java.lang.IndexOutOfBoundsException:
java.lang.NullPointerException
org.apache.hadoop.HadoopIllegalArgumentException:
org.apache.orc.FileFormatException:
org.apache.orc.UnknownFormatException:

 

> Orc file damage detection
> -------------------------
>
>                 Key: ORC-1028
>                 URL: https://issues.apache.org/jira/browse/ORC-1028
>             Project: ORC
>          Issue Type: New Feature
>          Components: Java
>            Reporter: 任建亭
>            Priority: Major
>
> On our cluster, we found a lot of corrupted ORC files. How do I quickly detect if an ORC file is corrupted? Is there a tool available to repair damaged ORC files if they are corrupted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)