You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Taher Koitawala <ta...@gmail.com> on 2022/09/04 12:26:27 UTC
Delete a specific iceberg PositionDelete file and add new one
Hi All,
Need your help with deleting a positionDelete file that has been
committed before and writing a new one instead.
The use case is we produce a Position delete file and commit it to
the table using rowDelta.addDeletes(posDeleteFile). Let us say this is
snapshot 1, when we do snapshot 2 what I want to do is delete posDeleteFile
from snapshot 1 and add a new file to snapshot 2.
To plan for files I am using
CloseableIterator<CombinedScanTask> iterator = table.newScan()
.useSnapshot(table.currentSnapshot().snapshotId())
.planTasks().iterator();
StreamSupport.stream(Spliterators
.spliteratorUnknownSize(iterator,
Spliterator.ORDERED)
, false)
.flatMap(f -> f.deletes().stream())
.collect(Collectors.toSet());
however, this gives me all the delete files. My specific usecase is to keep
rolling the delete file, so on a new snapshot, I want to do is delete the
delete file from snapshot - 1 and add a new delete file to the current
snapshot.
The current snapshot should then only read the new file committed and not
the ones from snapshot - 1.
Also, can you tell me how to delete the physical file along with snapshot
and manifest entries for the older delete file.
Regards,
Taher Koitawala
Re: Delete a specific iceberg PositionDelete file and add new one
Posted by Yufei Gu <fl...@gmail.com>.
Hi Taher,
Snapshot is immutable. It is not a good practice to manually delete a
file(data file or delete file) in a snapshot. Can you let us know why you
want to delete the delete-file in snapshot 1?
If you want to delete more rows, you can add a new pos delete file in the
snapshot 2. In that case, the data file will have multiple delete files,
and Iceberg reader consolidates all delete-files at the read time.
Best,
Yufei
`This is not a contribution`
On Wed, Sep 7, 2022 at 4:14 AM Taher Koitawala <ta...@gmail.com> wrote:
> Any ideas on this?
>
> On Sun, 4 Sep, 2022, 5:56 pm Taher Koitawala, <ta...@gmail.com> wrote:
>
>> Hi All,
>> Need your help with deleting a positionDelete file that has been
>> committed before and writing a new one instead.
>>
>> The use case is we produce a Position delete file and commit it
>> to the table using rowDelta.addDeletes(posDeleteFile). Let us say this is
>> snapshot 1, when we do snapshot 2 what I want to do is delete posDeleteFile
>> from snapshot 1 and add a new file to snapshot 2.
>>
>> To plan for files I am using
>>
>> CloseableIterator<CombinedScanTask> iterator = table.newScan()
>> .useSnapshot(table.currentSnapshot().snapshotId())
>> .planTasks().iterator();
>>
>> StreamSupport.stream(Spliterators
>> .spliteratorUnknownSize(iterator,
>> Spliterator.ORDERED)
>> , false)
>> .flatMap(f -> f.deletes().stream())
>> .collect(Collectors.toSet());
>>
>> however, this gives me all the delete files. My specific usecase is to
>> keep rolling the delete file, so on a new snapshot, I want to do is delete
>> the delete file from snapshot - 1 and add a new delete file to the current
>> snapshot.
>> The current snapshot should then only read the new file committed and not
>> the ones from snapshot - 1.
>>
>> Also, can you tell me how to delete the physical file along with snapshot
>> and manifest entries for the older delete file.
>>
>> Regards,
>> Taher Koitawala
>>
>
Re: Delete a specific iceberg PositionDelete file and add new one
Posted by Taher Koitawala <ta...@gmail.com>.
Any ideas on this?
On Sun, 4 Sep, 2022, 5:56 pm Taher Koitawala, <ta...@gmail.com> wrote:
> Hi All,
> Need your help with deleting a positionDelete file that has been
> committed before and writing a new one instead.
>
> The use case is we produce a Position delete file and commit it
> to the table using rowDelta.addDeletes(posDeleteFile). Let us say this is
> snapshot 1, when we do snapshot 2 what I want to do is delete posDeleteFile
> from snapshot 1 and add a new file to snapshot 2.
>
> To plan for files I am using
>
> CloseableIterator<CombinedScanTask> iterator = table.newScan()
> .useSnapshot(table.currentSnapshot().snapshotId())
> .planTasks().iterator();
>
> StreamSupport.stream(Spliterators
> .spliteratorUnknownSize(iterator,
> Spliterator.ORDERED)
> , false)
> .flatMap(f -> f.deletes().stream())
> .collect(Collectors.toSet());
>
> however, this gives me all the delete files. My specific usecase is to
> keep rolling the delete file, so on a new snapshot, I want to do is delete
> the delete file from snapshot - 1 and add a new delete file to the current
> snapshot.
> The current snapshot should then only read the new file committed and not
> the ones from snapshot - 1.
>
> Also, can you tell me how to delete the physical file along with snapshot
> and manifest entries for the older delete file.
>
> Regards,
> Taher Koitawala
>