You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Peter Vary <pv...@cloudera.com.INVALID> on 2022/05/05 13:58:59 UTC

Positional delete with vs without the delete row values

Hi Team,

We are working on integrating Iceberg V2 tables with Hive, and enabling delete and update operations.
The delete is implemented by Marton and the first version is already merged: https://issues.apache.org/jira/browse/HIVE-26102 <https://issues.apache.org/jira/browse/HIVE-26102>
The update statement is still in progress: https://issues.apache.org/jira/browse/HIVE-26136 <https://issues.apache.org/jira/browse/HIVE-26136>
The edges are a bit rough for the time being, so don’t use this in production :D

During the implementation we found that implementing deletes was quite straightforward with the Iceberg positional deletes, and without much effort we were able to provide the row values too. OTOH for updates we need to sort the delete files and the data files differently. ATM we have only a single result table, so we ended up implementing our own writer which is very similar to https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java <https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java> to do the sorting of the delete records for us. The problem with the SortedPosDeleteWriter is that when the record size grows then the number of records we can keep in memory decreases. So we ended up with our own writer which stores only the minimal things in memory and writes only positional deletes without the actual row values. See: https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergBufferedDeleteWriter.java <https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergBufferedDeleteWriter.java>

The question is:
- What is the experience of the community? When it is beneficial to have the row values in the positional delete files in production?

My feeling is:
The row data is best used when there is a filter in the query and we can filter out whole delete files when running the query.
There could be a slight improvement when we can skip RowGroups/Stripes based on the filter 

For the 1st point we just need to collect the statistics during the delete, but we do not have to actually persist the data.
Would it be viable to create these delete files where the statistics could not be calculated directly from the files themselves?
Would the community accept these files?

OTOH we have significant downsides for positional deletes with row values:
The delete file size increases significantly
We should keep smaller delete row RowGroup/Stripe size to accommodate the bigger amount of raw data - so we have to read more footers and adding IO overhead

So my feeling is that generally speaking positional deletes without the actual row data would be more performant the positional deletes with row data.

Do I miss something? Is there a use-case when using positional deletes with row values is significantly more effective?

Thanks,
Peter

Re: Positional delete with vs without the delete row values

Posted by Jack Ye <ye...@gmail.com>.

I think there is not much technical issue for Trino to support writing
position delete files with row data, because the old rows can be provided
in the page scanned to the update/delete note as additional channels. The
tradeoff is basically the CDC capability vs efficiency and delete file
size. My last memory is that CDC read requires this information, but I have
been busy moving home recently, and need to start catching up with the
conversations. Yufei might have a better idea for this part.

The statistics collected are persisted in the manifests for pruning
purposes, so I think there is benefit in collecting those statistics even
though it is not persisted in the underlying files.

-Jack

On Mon, May 9, 2022 at 1:35 AM Piotr Findeisen <pi...@starburstdata.com>
wrote:

> Hi Peter,
>
> FWIW, Trino Iceberg connector writes deletion files with just positions,
> without row data. cc @Alexander Jo <al...@starburstdata.com>
>
> > For the 1st point we just need to collect the statistics during the
> delete, but we do not have to actually persist the data.
>
> I would be weary of creating ORC/Parquet files with statistics that do not
> match actual file contents.
>
>
> > Do I miss something? Is there a use-case when using positional deletes
> with row values is significantly more effective?
>
> I recall some mention of CDC use-case -- producing CDC events from changes
> to a table. But I think I recall someone mentioning this usually ends up
> needing to join with actual data files anyway.
> @Ryan Blue <bl...@tabular.io> will know better, but in the meantime you
> can probably also dig the topic up in the mailing list.
>
> Best
> PF
>
>
>
>
>
>
> On Thu, May 5, 2022 at 3:59 PM Peter Vary <pv...@cloudera.com.invalid>
> wrote:
>
>> Hi Team,
>>
>> We are working on integrating Iceberg V2 tables with Hive, and enabling
>> delete and update operations.
>> The delete is implemented by Marton and the first version is already
>> merged: https://issues.apache.org/jira/browse/HIVE-26102
>> The update statement is still in progress:
>> https://issues.apache.org/jira/browse/HIVE-26136
>> The edges are a bit rough for the time being, so don’t use this in
>> production :D
>>
>> During the implementation we found that implementing deletes was quite
>> straightforward with the Iceberg positional deletes, and without much
>> effort we were able to provide the row values too. OTOH for updates we need
>> to sort the delete files and the data files differently. ATM we have only a
>> single result table, so we ended up implementing our own writer which is
>> very similar to
>> https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java to
>> do the sorting of the delete records for us. The problem with the
>> SortedPosDeleteWriter is that when the record size grows then the number of
>> records we can keep in memory decreases. So we ended up with our own writer
>> which stores only the minimal things in memory and writes only positional
>> deletes without the actual row values. See:
>> https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergBufferedDeleteWriter.java
>>
>> The question is:
>> - What is the experience of the community? When it is beneficial to have
>> the row values in the positional delete files in production?
>>
>> My feeling is:
>>
>>    1. The row data is best used when there is a filter in the query and
>>    we can filter out whole delete files when running the query.
>>    2. There could be a slight improvement when we can skip
>>    RowGroups/Stripes based on the filter
>>
>>
>> For the 1st point we just need to collect the statistics during the
>> delete, but we do not have to actually persist the data.
>> Would it be viable to create these delete files where the statistics
>> could not be calculated directly from the files themselves?
>> Would the community accept these files?
>>
>> OTOH we have significant downsides for positional deletes with row values:
>>
>>    1. The delete file size increases significantly
>>    2. We should keep smaller delete row RowGroup/Stripe size to
>>    accommodate the bigger amount of raw data - so we have to read more footers
>>    and adding IO overhead
>>
>>
>> So my feeling is that generally speaking positional deletes without the
>> actual row data would be more performant the positional deletes with row
>> data.
>>
>> Do I miss something? Is there a use-case when using positional deletes
>> with row values is significantly more effective?
>>
>> Thanks,
>> Peter
>>
>>

Re: Positional delete with vs without the delete row values

Posted by Piotr Findeisen <pi...@starburstdata.com>.

Hi Peter,

FWIW, Trino Iceberg connector writes deletion files with just positions,
without row data. cc @Alexander Jo <al...@starburstdata.com>

> For the 1st point we just need to collect the statistics during the
delete, but we do not have to actually persist the data.

I would be weary of creating ORC/Parquet files with statistics that do not
match actual file contents.


> Do I miss something? Is there a use-case when using positional deletes
with row values is significantly more effective?

I recall some mention of CDC use-case -- producing CDC events from changes
to a table. But I think I recall someone mentioning this usually ends up
needing to join with actual data files anyway.
@Ryan Blue <bl...@tabular.io> will know better, but in the meantime you can
probably also dig the topic up in the mailing list.

Best
PF






On Thu, May 5, 2022 at 3:59 PM Peter Vary <pv...@cloudera.com.invalid>
wrote:

> Hi Team,
>
> We are working on integrating Iceberg V2 tables with Hive, and enabling
> delete and update operations.
> The delete is implemented by Marton and the first version is already
> merged: https://issues.apache.org/jira/browse/HIVE-26102
> The update statement is still in progress:
> https://issues.apache.org/jira/browse/HIVE-26136
> The edges are a bit rough for the time being, so don’t use this in
> production :D
>
> During the implementation we found that implementing deletes was quite
> straightforward with the Iceberg positional deletes, and without much
> effort we were able to provide the row values too. OTOH for updates we need
> to sort the delete files and the data files differently. ATM we have only a
> single result table, so we ended up implementing our own writer which is
> very similar to
> https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/io/SortedPosDeleteWriter.java to
> do the sorting of the delete records for us. The problem with the
> SortedPosDeleteWriter is that when the record size grows then the number of
> records we can keep in memory decreases. So we ended up with our own writer
> which stores only the minimal things in memory and writes only positional
> deletes without the actual row values. See:
> https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergBufferedDeleteWriter.java
>
> The question is:
> - What is the experience of the community? When it is beneficial to have
> the row values in the positional delete files in production?
>
> My feeling is:
>
>    1. The row data is best used when there is a filter in the query and
>    we can filter out whole delete files when running the query.
>    2. There could be a slight improvement when we can skip
>    RowGroups/Stripes based on the filter
>
>
> For the 1st point we just need to collect the statistics during the
> delete, but we do not have to actually persist the data.
> Would it be viable to create these delete files where the statistics could
> not be calculated directly from the files themselves?
> Would the community accept these files?
>
> OTOH we have significant downsides for positional deletes with row values:
>
>    1. The delete file size increases significantly
>    2. We should keep smaller delete row RowGroup/Stripe size to
>    accommodate the bigger amount of raw data - so we have to read more footers
>    and adding IO overhead
>
>
> So my feeling is that generally speaking positional deletes without the
> actual row data would be more performant the positional deletes with row
> data.
>
> Do I miss something? Is there a use-case when using positional deletes
> with row values is significantly more effective?
>
> Thanks,
> Peter
>
>