You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Taher Koitawala <ta...@gmail.com> on 2022/09/04 12:26:27 UTC

Delete a specific iceberg PositionDelete file and add new one

Hi All,
         Need your help with deleting a positionDelete file that has been
committed before and writing a new one instead.

         The use case is we produce a Position delete file and commit it to
the table using rowDelta.addDeletes(posDeleteFile). Let us say this is
snapshot 1, when we do snapshot 2 what I want to do is delete posDeleteFile
from snapshot 1 and add a new file to snapshot 2.

To plan for files I am using

CloseableIterator<CombinedScanTask> iterator = table.newScan()
                .useSnapshot(table.currentSnapshot().snapshotId())
                .planTasks().iterator();

StreamSupport.stream(Spliterators
                                .spliteratorUnknownSize(iterator,
Spliterator.ORDERED)
                        , false)
                .flatMap(f -> f.deletes().stream())
                .collect(Collectors.toSet());

however, this gives me all the delete files. My specific usecase is to keep
rolling the delete file, so on a new snapshot, I want to do is delete the
delete file from snapshot - 1 and add a new delete file to the current
snapshot.
The current snapshot should then only read the new file committed and not
the ones from snapshot - 1.

Also, can you tell me how to delete the physical file along with snapshot
and manifest entries for the older delete file.

Regards,
Taher Koitawala

Re: Delete a specific iceberg PositionDelete file and add new one

Posted by Yufei Gu <fl...@gmail.com>.
Hi Taher,

Snapshot is immutable. It is not a good practice to manually delete a
file(data file or delete file) in a snapshot. Can you let us know why you
want to delete the delete-file in snapshot 1?
If you want to delete more rows, you can add a new pos delete file in the
snapshot 2. In that case, the data file will have multiple delete files,
and Iceberg reader consolidates all delete-files at the read time.

Best,

Yufei

`This is not a contribution`


On Wed, Sep 7, 2022 at 4:14 AM Taher Koitawala <ta...@gmail.com> wrote:

> Any ideas on this?
>
> On Sun, 4 Sep, 2022, 5:56 pm Taher Koitawala, <ta...@gmail.com> wrote:
>
>> Hi All,
>>          Need your help with deleting a positionDelete file that has been
>> committed before and writing a new one instead.
>>
>>          The use case is we produce a Position delete file and commit it
>> to the table using rowDelta.addDeletes(posDeleteFile). Let us say this is
>> snapshot 1, when we do snapshot 2 what I want to do is delete posDeleteFile
>> from snapshot 1 and add a new file to snapshot 2.
>>
>> To plan for files I am using
>>
>> CloseableIterator<CombinedScanTask> iterator = table.newScan()
>>                 .useSnapshot(table.currentSnapshot().snapshotId())
>>                 .planTasks().iterator();
>>
>> StreamSupport.stream(Spliterators
>>                                 .spliteratorUnknownSize(iterator,
>> Spliterator.ORDERED)
>>                         , false)
>>                 .flatMap(f -> f.deletes().stream())
>>                 .collect(Collectors.toSet());
>>
>> however, this gives me all the delete files. My specific usecase is to
>> keep rolling the delete file, so on a new snapshot, I want to do is delete
>> the delete file from snapshot - 1 and add a new delete file to the current
>> snapshot.
>> The current snapshot should then only read the new file committed and not
>> the ones from snapshot - 1.
>>
>> Also, can you tell me how to delete the physical file along with snapshot
>> and manifest entries for the older delete file.
>>
>> Regards,
>> Taher Koitawala
>>
>

Re: Delete a specific iceberg PositionDelete file and add new one

Posted by Taher Koitawala <ta...@gmail.com>.
Any ideas on this?

On Sun, 4 Sep, 2022, 5:56 pm Taher Koitawala, <ta...@gmail.com> wrote:

> Hi All,
>          Need your help with deleting a positionDelete file that has been
> committed before and writing a new one instead.
>
>          The use case is we produce a Position delete file and commit it
> to the table using rowDelta.addDeletes(posDeleteFile). Let us say this is
> snapshot 1, when we do snapshot 2 what I want to do is delete posDeleteFile
> from snapshot 1 and add a new file to snapshot 2.
>
> To plan for files I am using
>
> CloseableIterator<CombinedScanTask> iterator = table.newScan()
>                 .useSnapshot(table.currentSnapshot().snapshotId())
>                 .planTasks().iterator();
>
> StreamSupport.stream(Spliterators
>                                 .spliteratorUnknownSize(iterator,
> Spliterator.ORDERED)
>                         , false)
>                 .flatMap(f -> f.deletes().stream())
>                 .collect(Collectors.toSet());
>
> however, this gives me all the delete files. My specific usecase is to
> keep rolling the delete file, so on a new snapshot, I want to do is delete
> the delete file from snapshot - 1 and add a new delete file to the current
> snapshot.
> The current snapshot should then only read the new file committed and not
> the ones from snapshot - 1.
>
> Also, can you tell me how to delete the physical file along with snapshot
> and manifest entries for the older delete file.
>
> Regards,
> Taher Koitawala
>