You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Stephan Warren (Jira)" <ji...@apache.org> on 2019/11/25 02:25:00 UTC

[jira] [Updated] (AVRO-2637) Avro Tool's Repair Tool Can Go Into an Infinite Loop

     [ https://issues.apache.org/jira/browse/AVRO-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephan Warren updated AVRO-2637:
---------------------------------
    Description: 
There are certain avro files corrupt in such a way that the repair tool emarks on an infinite loop. Evidence:

Two unit test are added:

 
{code:java}
  @Test
  public void testReportCorruptLoopBlock() throws Exception {
    String corruptLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping.avro";
    String repairedLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping-FIXED.avro";
    String output = run(new DataFileRepairTool(), "-o", "all", corruptLoopFile, repairedLoopFile);
    assertTrue(output, output.contains("Number of blocks: 5 Number of corrupt blocks: 4"));
  }
  @Test
  public void testRepariedReportCorruptLoopBlock() throws Exception {
    String repairedLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping-FIXED.avro";
    String output = run(new DataFileRepairTool(), "-o", "report", repairedLoopFile, repairedLoopFile);
    assertTrue(output, output.contains("Number of blocks: 4 Number of corrupt blocks: 0"));
  }

{code}
 

The output look like this:

 

 
{noformat}
Failed to read block 0. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
...
{noformat}
 

 

With the inner recover portion of the code as such:
{code:java}
   private int innerRecover(DataFileReader<Object> fileReader, DataFileWriter<Object> fileWriter, PrintStream out,
      PrintStream err, boolean recoverPrior, boolean recoverAfter, Schema schema, File outfile) {
    int numBlocks = 0;
    int numCorruptBlocks = 0;
    int numRecords = 0;
    int lastNumRecords = -1;
    int numCorruptRecords = 0;
    int recordsWritten = 0;
    long lastPostion = -1;
    long position = fileReader.previousSync();
    long blockSize = 0;
    long blockCount = 0;
    boolean fileWritten = false;
    try {
      while (true) {
        try {
          if (!fileReader.hasNext()) {
            out.println("File Summary: ");
            out.println("  Number of blocks: " + numBlocks + " Number of corrupt blocks: " + numCorruptBlocks);
            out.println("  Number of records: " + numRecords + " Number of corrupt records: " + numCorruptRecords);
            if (recoverAfter || recoverPrior) {
              out.println("  Number of records written " + recordsWritten);
            }
            out.println();
            return 0;
          }
          position = fileReader.previousSync();
          blockCount = fileReader.getBlockCount();
          blockSize = fileReader.getBlockSize();
          numRecords += blockCount;
          long blockRemaining = blockCount;
          numBlocks++;
          boolean lastRecordWasBad = false;
          long badRecordsInBlock = 0;
          err.println("Details Prior: numblocks: "+numBlocks +
            ", blockRemaining: "+ blockRemaining +
            ", lastRecordWasBad: " + lastRecordWasBad +
            ", numCorruptRecords: " + numCorruptRecords +
            ", badRecordsInBloc: " + badRecordsInBlock);
          while (blockRemaining > 0) {
            try {
              Object datum = fileReader.next();
              if ((recoverPrior && numCorruptBlocks == 0) || (recoverAfter && numCorruptBlocks > 0)) {
                if (!fileWritten) {
                  try {
                    fileWriter.create(schema, outfile);
                    fileWritten = true;
                  } catch (Exception e) {
                    e.printStackTrace(err);
                    return 1;
                  }
                }
                try {
                  fileWriter.append(datum);
                  recordsWritten++;
                } catch (Exception e) {
                  e.printStackTrace(err);
                  throw e;
                }
              }
              blockRemaining--;
              lastRecordWasBad = false;
//              err.println("Details #1: blockRemaining: "+ blockRemaining +
//                ", lastRecordWasBad: " + lastRecordWasBad +
//                ", numCorruptRecords: " + numCorruptRecords +
//                ", badRecordsInBloc: " + badRecordsInBlock);
            } catch (Exception e) {
              long pos = blockCount - blockRemaining;
              if (badRecordsInBlock == 0) {
                // first corrupt record
                numCorruptBlocks++;
                err.println("Corrupt block: " + numBlocks + " Records in block: " + blockCount
                    + " uncompressed block size: " + blockSize);
                err.println("Corrupt record at position: " + (pos));
              } else {
                // second bad record in block, if consecutive skip block.
                err.println("Corrupt record at position: " + (pos));
                if (lastRecordWasBad) {
                  // consecutive bad record
                  err.println(
                      "Second consecutive bad record in block: " + numBlocks + ". Skipping remainder of block. ");
                  numCorruptRecords += blockRemaining;
                  badRecordsInBlock += blockRemaining;
                  try {
                    fileReader.sync(position);
                  } catch (Exception e2) {
                    err.println("failed to sync to sync marker, aborting");
                    e2.printStackTrace(err);
                    return 1;
                  }
                  break;
                }
              }
              blockRemaining--;
              lastRecordWasBad = true;
              numCorruptRecords++;
              badRecordsInBlock++;
              err.println("Details #2: blockRemaining: "+ blockRemaining +
                ", lastRecordWasBad: " + lastRecordWasBad +
                ", numCorruptRecords: " + numCorruptRecords +
                ", badRecordsInBloc: " + badRecordsInBlock);
            }
          }
          err.println("Details After: blockRemaining: "+ blockRemaining +
            ", lastRecordWasBad: " + lastRecordWasBad +
            ", numCorruptRecords: " + numCorruptRecords +
            ", badRecordsInBloc: " + badRecordsInBlock);

          if (badRecordsInBlock != 0) {
            err.println("** Number of unrecoverable records in block: " + (badRecordsInBlock));
          }
          position = fileReader.previousSync();
        } catch (Exception e) {

//          if(lastNumRecords == numRecords) {
//          if(lastPostion == position) {
          if(false) {
              position++;
          }
          else {
            lastNumRecords = numRecords;
            lastPostion = position;
            err.println("Failed to read block " + numBlocks + ". Unknown record " + "count in block.  Skipping. Reason: "
              + e.getMessage());

            numCorruptBlocks++;

            err.printf(
              "    int numBlocks = %d;\n" +
                "    int numCorruptBlocks = %d;\n" +
                "    int numRecords = %d;\n" +
                "    int numCorruptRecords = %d;\n" +
                "    int recordsWritten = %d;\n" +
                "    long position = %d / 0x%04x;\n" +
                "    long blockSize = %d / 0x%04x;\n" +
                "    long blockCount = %d / 0x%04x\n",
              numBlocks,
              numCorruptBlocks,
              numRecords,
              numCorruptRecords,
              recordsWritten,
              position, position,
              blockSize, blockSize,
              blockCount, blockCount);
            try {
              fileReader.sync(position);
            } catch (Exception e2) {
              err.println("failed to sync to sync marker, aborting");
              e2.printStackTrace(err);
              return 1;
            }
          }
        }
      }
    } finally {
      if (fileWritten) {
        try {
          fileWriter.close();
        } catch (Exception e) {
          e.printStackTrace(err);
          return 1;
        }
      }
    }
  }
{code}


With the above code, we can see a pattern emerge:


{noformat}
Failed to read block 0. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 0;
    int numCorruptBlocks = 1;
    int numRecords = 0;
    int numCorruptRecords = 0;
    int recordsWritten = 0;
    long position = 6504 / 0x1968;
    long blockSize = 0 / 0x0000;
    long blockCount = 0 / 0x0000
Details Prior: numblocks: 1, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details Prior: numblocks: 2, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details Prior: numblocks: 3, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 2;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 3;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 4;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 5;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0

{noformat}

While it's not the probable correct acceptable fix, the following change seems to make the repair complete in the above provided code:
{code:java}
          if(lastPostion == position) {
//          if(false) {
{code}

All the unit tests associated the TestDataFileRepairTool class seem to pass including the hacked two above. 

I cannot provide the proprietary file, but I'd be happy to address questions.


  was:
There are certain avro files corrupt in such a way that the repair tool emarks on an infinite loop. Evidence:

Two unit test are added:

 
{code:java}
  @Test
  public void testReportCorruptLoopBlock() throws Exception {
    String corruptLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping.avro";
    String repairedLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping-FIXED.avro";
    String output = run(new DataFileRepairTool(), "-o", "all", corruptLoopFile, repairedLoopFile);
    assertTrue(output, output.contains("Number of blocks: 5 Number of corrupt blocks: 4"));
  }
  @Test
  public void testRepariedReportCorruptLoopBlock() throws Exception {
    String repairedLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping-FIXED.avro";
    String output = run(new DataFileRepairTool(), "-o", "report", repairedLoopFile, repairedLoopFile);
    assertTrue(output, output.contains("Number of blocks: 4 Number of corrupt blocks: 0"));
  }

{code}
 

The output look like this:

 

 
{noformat}
Failed to read block 0. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
...
{noformat}
 

 

With the inner recover portion of the code as such:
{code:java}
   private int innerRecover(DataFileReader<Object> fileReader, DataFileWriter<Object> fileWriter, PrintStream out,
      PrintStream err, boolean recoverPrior, boolean recoverAfter, Schema schema, File outfile) {
    int numBlocks = 0;
    int numCorruptBlocks = 0;
    int numRecords = 0;
    int lastNumRecords = -1;
    int numCorruptRecords = 0;
    int recordsWritten = 0;
    long lastPostion = -1;
    long position = fileReader.previousSync();
    long blockSize = 0;
    long blockCount = 0;
    boolean fileWritten = false;
    try {
      while (true) {
        try {
          if (!fileReader.hasNext()) {
            out.println("File Summary: ");
            out.println("  Number of blocks: " + numBlocks + " Number of corrupt blocks: " + numCorruptBlocks);
            out.println("  Number of records: " + numRecords + " Number of corrupt records: " + numCorruptRecords);
            if (recoverAfter || recoverPrior) {
              out.println("  Number of records written " + recordsWritten);
            }
            out.println();
            return 0;
          }
          position = fileReader.previousSync();
          blockCount = fileReader.getBlockCount();
          blockSize = fileReader.getBlockSize();
          numRecords += blockCount;
          long blockRemaining = blockCount;
          numBlocks++;
          boolean lastRecordWasBad = false;
          long badRecordsInBlock = 0;
          err.println("Details Prior: numblocks: "+numBlocks +
            ", blockRemaining: "+ blockRemaining +
            ", lastRecordWasBad: " + lastRecordWasBad +
            ", numCorruptRecords: " + numCorruptRecords +
            ", badRecordsInBloc: " + badRecordsInBlock);
          while (blockRemaining > 0) {
            try {
              Object datum = fileReader.next();
              if ((recoverPrior && numCorruptBlocks == 0) || (recoverAfter && numCorruptBlocks > 0)) {
                if (!fileWritten) {
                  try {
                    fileWriter.create(schema, outfile);
                    fileWritten = true;
                  } catch (Exception e) {
                    e.printStackTrace(err);
                    return 1;
                  }
                }
                try {
                  fileWriter.append(datum);
                  recordsWritten++;
                } catch (Exception e) {
                  e.printStackTrace(err);
                  throw e;
                }
              }
              blockRemaining--;
              lastRecordWasBad = false;
//              err.println("Details #1: blockRemaining: "+ blockRemaining +
//                ", lastRecordWasBad: " + lastRecordWasBad +
//                ", numCorruptRecords: " + numCorruptRecords +
//                ", badRecordsInBloc: " + badRecordsInBlock);
            } catch (Exception e) {
              long pos = blockCount - blockRemaining;
              if (badRecordsInBlock == 0) {
                // first corrupt record
                numCorruptBlocks++;
                err.println("Corrupt block: " + numBlocks + " Records in block: " + blockCount
                    + " uncompressed block size: " + blockSize);
                err.println("Corrupt record at position: " + (pos));
              } else {
                // second bad record in block, if consecutive skip block.
                err.println("Corrupt record at position: " + (pos));
                if (lastRecordWasBad) {
                  // consecutive bad record
                  err.println(
                      "Second consecutive bad record in block: " + numBlocks + ". Skipping remainder of block. ");
                  numCorruptRecords += blockRemaining;
                  badRecordsInBlock += blockRemaining;
                  try {
                    fileReader.sync(position);
                  } catch (Exception e2) {
                    err.println("failed to sync to sync marker, aborting");
                    e2.printStackTrace(err);
                    return 1;
                  }
                  break;
                }
              }
              blockRemaining--;
              lastRecordWasBad = true;
              numCorruptRecords++;
              badRecordsInBlock++;
              err.println("Details #2: blockRemaining: "+ blockRemaining +
                ", lastRecordWasBad: " + lastRecordWasBad +
                ", numCorruptRecords: " + numCorruptRecords +
                ", badRecordsInBloc: " + badRecordsInBlock);
            }
          }
          err.println("Details After: blockRemaining: "+ blockRemaining +
            ", lastRecordWasBad: " + lastRecordWasBad +
            ", numCorruptRecords: " + numCorruptRecords +
            ", badRecordsInBloc: " + badRecordsInBlock);

          if (badRecordsInBlock != 0) {
            err.println("** Number of unrecoverable records in block: " + (badRecordsInBlock));
          }
          position = fileReader.previousSync();
        } catch (Exception e) {

//          if(lastNumRecords == numRecords) {
//          if(lastPostion == position) {
          if(false) {
              position++;
          }
          else {
            lastNumRecords = numRecords;
            lastPostion = position;
            err.println("Failed to read block " + numBlocks + ". Unknown record " + "count in block.  Skipping. Reason: "
              + e.getMessage());

            numCorruptBlocks++;

            err.printf(
              "    int numBlocks = %d;\n" +
                "    int numCorruptBlocks = %d;\n" +
                "    int numRecords = %d;\n" +
                "    int numCorruptRecords = %d;\n" +
                "    int recordsWritten = %d;\n" +
                "    long position = %d / 0x%04x;\n" +
                "    long blockSize = %d / 0x%04x;\n" +
                "    long blockCount = %d / 0x%04x\n",
              numBlocks,
              numCorruptBlocks,
              numRecords,
              numCorruptRecords,
              recordsWritten,
              position, position,
              blockSize, blockSize,
              blockCount, blockCount);
            try {
              fileReader.sync(position);
            } catch (Exception e2) {
              err.println("failed to sync to sync marker, aborting");
              e2.printStackTrace(err);
              return 1;
            }
          }
        }
      }
    } finally {
      if (fileWritten) {
        try {
          fileWriter.close();
        } catch (Exception e) {
          e.printStackTrace(err);
          return 1;
        }
      }
    }
  }
{code}


With the above code, we can see a pattern emerge:


{noformat}
Failed to read block 0. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 0;
    int numCorruptBlocks = 1;
    int numRecords = 0;
    int numCorruptRecords = 0;
    int recordsWritten = 0;
    long position = 6504 / 0x1968;
    long blockSize = 0 / 0x0000;
    long blockCount = 0 / 0x0000
Details Prior: numblocks: 1, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details Prior: numblocks: 2, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details Prior: numblocks: 3, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 2;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 3;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 4;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0
Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
    int numBlocks = 3;
    int numCorruptBlocks = 5;
    int numRecords = 1392;
    int numCorruptRecords = 0;
    int recordsWritten = 1392;
    long position = 248627 / 0x3cb33;
    long blockSize = 17611 / 0x44cb;
    long blockCount = 464 / 0x01d0

{noformat}

While it's not the probable correct acceptable fix, the following change seems to make the repair complete in the above provided code:
{code:java}
          if(lastPostion == position) {
//          if(false) {
{code}

I cannot provide the proprietary file, but I'd be happy to address questions.



> Avro Tool's Repair Tool Can Go Into an Infinite Loop
> ----------------------------------------------------
>
>                 Key: AVRO-2637
>                 URL: https://issues.apache.org/jira/browse/AVRO-2637
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 1.9.1
>            Reporter: Stephan Warren
>            Priority: Major
>
> There are certain avro files corrupt in such a way that the repair tool emarks on an infinite loop. Evidence:
> Two unit test are added:
>  
> {code:java}
>   @Test
>   public void testReportCorruptLoopBlock() throws Exception {
>     String corruptLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping.avro";
>     String repairedLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping-FIXED.avro";
>     String output = run(new DataFileRepairTool(), "-o", "all", corruptLoopFile, repairedLoopFile);
>     assertTrue(output, output.contains("Number of blocks: 5 Number of corrupt blocks: 4"));
>   }
>   @Test
>   public void testRepariedReportCorruptLoopBlock() throws Exception {
>     String repairedLoopFile = "/Users/stephan/IdeaProjects/avro/lang/java/tools/src/test/resources/Report_looping-FIXED.avro";
>     String output = run(new DataFileRepairTool(), "-o", "report", repairedLoopFile, repairedLoopFile);
>     assertTrue(output, output.contains("Number of blocks: 4 Number of corrupt blocks: 0"));
>   }
> {code}
>  
> The output look like this:
>  
>  
> {noformat}
> Failed to read block 0. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
> Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
> Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
> Failed to read block 3. Unknown record count in block. Skipping. Reason: java.io.IOException: Invalid sync!
> ...
> {noformat}
>  
>  
> With the inner recover portion of the code as such:
> {code:java}
>    private int innerRecover(DataFileReader<Object> fileReader, DataFileWriter<Object> fileWriter, PrintStream out,
>       PrintStream err, boolean recoverPrior, boolean recoverAfter, Schema schema, File outfile) {
>     int numBlocks = 0;
>     int numCorruptBlocks = 0;
>     int numRecords = 0;
>     int lastNumRecords = -1;
>     int numCorruptRecords = 0;
>     int recordsWritten = 0;
>     long lastPostion = -1;
>     long position = fileReader.previousSync();
>     long blockSize = 0;
>     long blockCount = 0;
>     boolean fileWritten = false;
>     try {
>       while (true) {
>         try {
>           if (!fileReader.hasNext()) {
>             out.println("File Summary: ");
>             out.println("  Number of blocks: " + numBlocks + " Number of corrupt blocks: " + numCorruptBlocks);
>             out.println("  Number of records: " + numRecords + " Number of corrupt records: " + numCorruptRecords);
>             if (recoverAfter || recoverPrior) {
>               out.println("  Number of records written " + recordsWritten);
>             }
>             out.println();
>             return 0;
>           }
>           position = fileReader.previousSync();
>           blockCount = fileReader.getBlockCount();
>           blockSize = fileReader.getBlockSize();
>           numRecords += blockCount;
>           long blockRemaining = blockCount;
>           numBlocks++;
>           boolean lastRecordWasBad = false;
>           long badRecordsInBlock = 0;
>           err.println("Details Prior: numblocks: "+numBlocks +
>             ", blockRemaining: "+ blockRemaining +
>             ", lastRecordWasBad: " + lastRecordWasBad +
>             ", numCorruptRecords: " + numCorruptRecords +
>             ", badRecordsInBloc: " + badRecordsInBlock);
>           while (blockRemaining > 0) {
>             try {
>               Object datum = fileReader.next();
>               if ((recoverPrior && numCorruptBlocks == 0) || (recoverAfter && numCorruptBlocks > 0)) {
>                 if (!fileWritten) {
>                   try {
>                     fileWriter.create(schema, outfile);
>                     fileWritten = true;
>                   } catch (Exception e) {
>                     e.printStackTrace(err);
>                     return 1;
>                   }
>                 }
>                 try {
>                   fileWriter.append(datum);
>                   recordsWritten++;
>                 } catch (Exception e) {
>                   e.printStackTrace(err);
>                   throw e;
>                 }
>               }
>               blockRemaining--;
>               lastRecordWasBad = false;
> //              err.println("Details #1: blockRemaining: "+ blockRemaining +
> //                ", lastRecordWasBad: " + lastRecordWasBad +
> //                ", numCorruptRecords: " + numCorruptRecords +
> //                ", badRecordsInBloc: " + badRecordsInBlock);
>             } catch (Exception e) {
>               long pos = blockCount - blockRemaining;
>               if (badRecordsInBlock == 0) {
>                 // first corrupt record
>                 numCorruptBlocks++;
>                 err.println("Corrupt block: " + numBlocks + " Records in block: " + blockCount
>                     + " uncompressed block size: " + blockSize);
>                 err.println("Corrupt record at position: " + (pos));
>               } else {
>                 // second bad record in block, if consecutive skip block.
>                 err.println("Corrupt record at position: " + (pos));
>                 if (lastRecordWasBad) {
>                   // consecutive bad record
>                   err.println(
>                       "Second consecutive bad record in block: " + numBlocks + ". Skipping remainder of block. ");
>                   numCorruptRecords += blockRemaining;
>                   badRecordsInBlock += blockRemaining;
>                   try {
>                     fileReader.sync(position);
>                   } catch (Exception e2) {
>                     err.println("failed to sync to sync marker, aborting");
>                     e2.printStackTrace(err);
>                     return 1;
>                   }
>                   break;
>                 }
>               }
>               blockRemaining--;
>               lastRecordWasBad = true;
>               numCorruptRecords++;
>               badRecordsInBlock++;
>               err.println("Details #2: blockRemaining: "+ blockRemaining +
>                 ", lastRecordWasBad: " + lastRecordWasBad +
>                 ", numCorruptRecords: " + numCorruptRecords +
>                 ", badRecordsInBloc: " + badRecordsInBlock);
>             }
>           }
>           err.println("Details After: blockRemaining: "+ blockRemaining +
>             ", lastRecordWasBad: " + lastRecordWasBad +
>             ", numCorruptRecords: " + numCorruptRecords +
>             ", badRecordsInBloc: " + badRecordsInBlock);
>           if (badRecordsInBlock != 0) {
>             err.println("** Number of unrecoverable records in block: " + (badRecordsInBlock));
>           }
>           position = fileReader.previousSync();
>         } catch (Exception e) {
> //          if(lastNumRecords == numRecords) {
> //          if(lastPostion == position) {
>           if(false) {
>               position++;
>           }
>           else {
>             lastNumRecords = numRecords;
>             lastPostion = position;
>             err.println("Failed to read block " + numBlocks + ". Unknown record " + "count in block.  Skipping. Reason: "
>               + e.getMessage());
>             numCorruptBlocks++;
>             err.printf(
>               "    int numBlocks = %d;\n" +
>                 "    int numCorruptBlocks = %d;\n" +
>                 "    int numRecords = %d;\n" +
>                 "    int numCorruptRecords = %d;\n" +
>                 "    int recordsWritten = %d;\n" +
>                 "    long position = %d / 0x%04x;\n" +
>                 "    long blockSize = %d / 0x%04x;\n" +
>                 "    long blockCount = %d / 0x%04x\n",
>               numBlocks,
>               numCorruptBlocks,
>               numRecords,
>               numCorruptRecords,
>               recordsWritten,
>               position, position,
>               blockSize, blockSize,
>               blockCount, blockCount);
>             try {
>               fileReader.sync(position);
>             } catch (Exception e2) {
>               err.println("failed to sync to sync marker, aborting");
>               e2.printStackTrace(err);
>               return 1;
>             }
>           }
>         }
>       }
>     } finally {
>       if (fileWritten) {
>         try {
>           fileWriter.close();
>         } catch (Exception e) {
>           e.printStackTrace(err);
>           return 1;
>         }
>       }
>     }
>   }
> {code}
> With the above code, we can see a pattern emerge:
> {noformat}
> Failed to read block 0. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
>     int numBlocks = 0;
>     int numCorruptBlocks = 1;
>     int numRecords = 0;
>     int numCorruptRecords = 0;
>     int recordsWritten = 0;
>     long position = 6504 / 0x1968;
>     long blockSize = 0 / 0x0000;
>     long blockCount = 0 / 0x0000
> Details Prior: numblocks: 1, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
> Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
> Details Prior: numblocks: 2, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
> Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
> Details Prior: numblocks: 3, blockRemaining: 464, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
> Details After: blockRemaining: 0, lastRecordWasBad: false, numCorruptRecords: 0, badRecordsInBloc: 0
> Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
>     int numBlocks = 3;
>     int numCorruptBlocks = 2;
>     int numRecords = 1392;
>     int numCorruptRecords = 0;
>     int recordsWritten = 1392;
>     long position = 248627 / 0x3cb33;
>     long blockSize = 17611 / 0x44cb;
>     long blockCount = 464 / 0x01d0
> Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
>     int numBlocks = 3;
>     int numCorruptBlocks = 3;
>     int numRecords = 1392;
>     int numCorruptRecords = 0;
>     int recordsWritten = 1392;
>     long position = 248627 / 0x3cb33;
>     long blockSize = 17611 / 0x44cb;
>     long blockCount = 464 / 0x01d0
> Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
>     int numBlocks = 3;
>     int numCorruptBlocks = 4;
>     int numRecords = 1392;
>     int numCorruptRecords = 0;
>     int recordsWritten = 1392;
>     long position = 248627 / 0x3cb33;
>     long blockSize = 17611 / 0x44cb;
>     long blockCount = 464 / 0x01d0
> Failed to read block 3. Unknown record count in block.  Skipping. Reason: java.io.IOException: Invalid sync!
>     int numBlocks = 3;
>     int numCorruptBlocks = 5;
>     int numRecords = 1392;
>     int numCorruptRecords = 0;
>     int recordsWritten = 1392;
>     long position = 248627 / 0x3cb33;
>     long blockSize = 17611 / 0x44cb;
>     long blockCount = 464 / 0x01d0
> {noformat}
> While it's not the probable correct acceptable fix, the following change seems to make the repair complete in the above provided code:
> {code:java}
>           if(lastPostion == position) {
> //          if(false) {
> {code}
> All the unit tests associated the TestDataFileRepairTool class seem to pass including the hacked two above. 
> I cannot provide the proprietary file, but I'd be happy to address questions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)