You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/07/30 19:41:59 UTC

[GitHub] [incubator-iceberg] arina-ielchiieva opened a new issue #330: Orphan manifest file when performing delete in transaction

arina-ielchiieva opened a new issue #330: Orphan manifest file when performing delete in transaction
URL: https://github.com/apache/incubator-iceberg/issues/330
 
 
   Steps to reproduce:
   
   1. Create Iceberg table and add two files:
   ```
       Tables tables = new HadoopTables();
   
       Schema schema = new Schema(
         required(1, "id", Types.IntegerType.get()),
         required(2, "date", Types.StringType.get())
       );
   
       PartitionSpec spec = PartitionSpec.builderFor(schema)
         .identity("id")
         .build();
   
       Table table = tables.create(schema, spec, "iceberg-table");
   
       DataFile file_a = DataFiles.builder(spec)
         .withPath("iceberg-table/data-a.parquet")
         .withFileSizeInBytes(0)
         .withPartitionPath("id=0")
         .withRecordCount(2)
         .build();
   
       DataFile file_b = DataFiles.builder(spec)
         .withPath("iceberg-table/data-b.parquet")
         .withFileSizeInBytes(0)
         .withPartitionPath("id=1")
         .withRecordCount(2)
         .build();
   
       table.newAppend()
         .appendFile(file_a)
         .commit();
   
       table.newAppend()
         .appendFile(file_b)
         .commit();
   ```
   Each append operation produces snapshot and corresponding manifest files:
   ```
   iceberg-table/metadata/0f29e64b-b181-4cd5-b812-9dd339470b72-m0.avro
   iceberg-table/metadata/08740186-305d-4d95-b79a-97cce9cbeb18-m0.avro
   iceberg-table/metadata/snap-1085365639864830772-1-0f29e64b-b181-4cd5-b812-9dd339470b72.avro
   iceberg-table/metadata/snap-1154652068483978044-1-08740186-305d-4d95-b79a-97cce9cbeb18.avro
   ```
   2. Delete one file in transaction:
   ```
       Transaction newTransaction = table.newTransaction();
       newTransaction.newDelete()
         .deleteFile(file_a)
         .commit();
       newTransaction.commitTransaction();
   ```
   After delete operation, new snapshot is created with two manifest files:
   ```
   iceberg-table/metadata/dc0ad7f2-1256-4449-99dd-a6d201413794-m0.avro
   iceberg-table/metadata/dc0ad7f2-1256-4449-99dd-a6d201413794-m3.avro
   iceberg-table/metadata/snap-3924283385560426818-2-dc0ad7f2-1256-4449-99dd-a6d201413794.avro
   ```
   When listing manifest files from the snapshot, it shows only one manifest file:
   ```
       Snapshot snapshot = table.currentSnapshot();
       snapshot.manifests().stream()
         .map(ManifestFile::path)
         .forEach(System.out::println);
   ```
   Result: 
   ```
   iceberg-table/metadata/dc0ad7f2-1256-4449-99dd-a6d201413794-m3.avro
   ```
   
   If delete operation is performed not in transaction, only one manifest file is created:
   ```
       table.newDelete()
         .deleteFile(file_a)
         .commit();
   ```
   Result:
   ```
   iceberg-table/metadata/11368504-6738-47e0-a865-1733d972d156-m1.avro
   iceberg-table/metadata/snap-943059452884103391-1-11368504-6738-47e0-a865-1733d972d156.avro
   ```
   
   Issue was found when orphaned manifest files were not deleted during snapshots expiration process.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org