You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/10/27 09:58:00 UTC

[jira] [Resolved] (ORC-1030) Java Tools Recover File command does not accurately find OrcFile.MAGIC

     [ https://issues.apache.org/jira/browse/ORC-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved ORC-1030.
--------------------------------
    Fix Version/s: 1.7.1
                   1.8.0
         Assignee: Yiqun Zhang
       Resolution: Fixed

This is resolved via https://github.com/apache/orc/pull/941

> Java Tools Recover File command does not accurately find OrcFile.MAGIC
> ----------------------------------------------------------------------
>
>                 Key: ORC-1030
>                 URL: https://issues.apache.org/jira/browse/ORC-1030
>             Project: ORC
>          Issue Type: Bug
>          Components: Java, tools
>    Affects Versions: 1.7.0, 1.8.0, 1.6.11
>            Reporter: Yiqun Zhang
>            Assignee: Yiqun Zhang
>            Priority: Major
>             Fix For: 1.8.0, 1.7.1
>
>
> {code:java}
>         while (remaining > 0) {
>           int toRead = (int) Math.min(DEFAULT_BLOCK_SIZE, remaining);
>           byte[] data = new byte[toRead];
>           long startPos = corruptFileLen - remaining;
>           fdis.readFully(startPos, data, 0, toRead);
>           // find all MAGIC string and see if the file is readable from there
>           int index = 0;
>           long nextFooterOffset;
>           byte[] magicBytes = OrcFile.MAGIC.getBytes(StandardCharsets.UTF_8);
>           while (index != -1) {
>             index = indexOf(data, magicBytes, index + 1);
>             if (index != -1) {
>               nextFooterOffset = startPos + index + magicBytes.length + 1;
>               if (isReadable(corruptPath, conf, nextFooterOffset)) {
>                 footerOffsets.add(nextFooterOffset);
>               }
>             }
>           }
>           System.err.println("Scanning for valid footers - startPos: " + startPos +
>               " toRead: " + toRead + " remaining: " + remaining);
>           remaining = remaining - toRead;
>         }
> {code}
> Two adjacent reads may be exactly separated by OrcFile.MAGIC, making it impossible to find the location of the recovered file. Because the current implementation only matches in a single read.
> {code:java}
>   private static int indexOf(final byte[] data, final byte[] pattern, final int index) {
>     if (data == null || data.length == 0 || pattern == null || pattern.length == 0 ||
>         index > data.length || index < 0) {
>       return -1;
>     }
>     int j = 0;
>     for (int i = index; i < data.length; i++) {
>       if (pattern[j] == data[i]) {
>         j++;
>       } else {
>         j = 0;
>       }
>       if (j == pattern.length) {
>         return i - pattern.length + 1;
>       }
>     }
>     return -1;
>   }
> {code}
> This matching algorithm is wrong when i does not backtrack after a failed match in the middle. As a simple example data = OOORC, pattern= ORC, index = 1, this algorithm will return -1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)