You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Vivekanand Vellanki <vi...@dremio.com> on 2020/11/26 12:54:54 UTC

Does data file deletion always rewrite the manifest file

Hi,

I am trying to understand how data file deletion is handled when the
transaction commits.

From this line of code
<https://github.com/apache/iceberg/blob/e69e52146d27956221ea4df4ad0baf2af7c827cd/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L372>,
it looks like the manifest file containing the deleted data file is
rewritten and a new manifest file is created as part of the transaction
that deletes data files.

This indicates that data-files that are deleted can be ignored while
planning.

Is the above statement true by Specification? or is this an implementation
detail that can be changed in the future?

Thanks
Vivek

Re: Does data file deletion always rewrite the manifest file

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Hi Vivekanand,

You're right that a manifest with a file that is deleted will be rewritten
and replaced. Scan planning will ignore any deleted data file in a
manifest. Whether a file is deleted is controlled by the manifest entry's
status, which could be added, existing, or deleted. Using those three
values, we can easily recover the changes in a given manifest, and the
changes from a snapshot because we know the manifests that were created for
a given snapshot. And yes, this is part of the specification.

On Thu, Nov 26, 2020 at 4:55 AM Vivekanand Vellanki <vi...@dremio.com>
wrote:

> Hi,
>
> I am trying to understand how data file deletion is handled when the
> transaction commits.
>
> From this line of code
> <https://github.com/apache/iceberg/blob/e69e52146d27956221ea4df4ad0baf2af7c827cd/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L372>,
> it looks like the manifest file containing the deleted data file is
> rewritten and a new manifest file is created as part of the transaction
> that deletes data files.
>
> This indicates that data-files that are deleted can be ignored while
> planning.
>
> Is the above statement true by Specification? or is this an implementation
> detail that can be changed in the future?
>
> Thanks
> Vivek
>
>

-- 
Ryan Blue
Software Engineer
Netflix