You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Wei-Chiu Chuang <we...@apache.org> on 2022/10/08 00:27:50 UTC

[DISCUSS] Supporting partial file rewrite/compose

There were a number of discussions that happened during ApacheCon. In the
spirit of the Apache Way, I am taking the conversation online, sharing with
the larger community and also capturing requirements. Credits to Owen who
started this discussion.

There are a number of scenarios where users want to partially rewrite file
blocks, and it would make sense to create a file system API to make these
operations efficient.

1. Apache Iceberg or other evolvable table format.
These table formats need to update table schema. The underlying files are
rewritten but only a subset of blocks are changed. It would be much more
efficient if a new file can be composed using some of the existing file
blocks.

2. GDPR compliance "the right to erasure"
Files must be rewritten to remove a person's data at request. Again, this
is efficient because only a small set of file blocks is updated.

3. In-place erasure coding conversion.
I had a proposal to support atomically rewriting replicated files into
erasure coded files. This can be the building block to support auto-tiering.

Thoughts? What would be a good FS interface to support these requirements?

For Ozone folks, Ritesh opened a jira: HDDS-7297
<https://issues.apache.org/jira/browse/HDDS-7297> but I figured a larger
conversation should happen so that we can take into the consideration of
other FS implementations.

Thanks,
Weichiu

Re: [DISCUSS] Supporting partial file rewrite/compose

Posted by Ayush Saxena <ay...@gmail.com>.
Hi Wei-Chiu,

I think this got lost somewhere or the discussion moved somewhere else, if
so please loop in me as well. Just guessing why everyone is so quiet :-)

1. Apache Iceberg or other evolvable table format.


 The first point that you mentioned regarding the iceberg, may be the use
case that you mentioned is not the same as what I can think of, but I still
want to rewrite actually the data files and if I can save on rewriting the
complete ones in Iceberg's copy on write mode, I feel that would lead to
read performance improvement as I can ditch  the Merge on Read mode & the
write performance won't suffer because I didn't re-write the entire file,
just removed some data from the actual data files rather than maintaining
in the delete file. I `feel` the performance should be more or less the
same as writing a delete file.

Maybe another use case could be Hive-Acid tables, can help in compactions &
those delete delta files and stuff like that, not going deep into that but
maybe...

2. GDPR compliance "the right to erasure"

That is even what my use case was also looking like just delete a record or
set of records. Just the records are stored in a Table

From the HDFS point of view I think this isn't naive but I still feel it is
doable, *Do you have pointers regarding if it is possible with the object
stores as well*, that is where my interest lies.

3. In-place erasure coding conversion.

Regarding the Erasure Coding In-Place conversion from Replicated files, If
I remember it correct, there was a branch created for it, some patches were
committed, playing with some header or so, I have some faint memory, the
only issue which I remember with the design was if someone is reading a
file which is replicated and it gets converted into ErasureCoded file and
that guy for some reason refethes the block, he would be failing, may be
some changes in the DFSInputStream to handle or move to
DFSStripedInputStream in such situations might have solved it, but I guess
folks chasing it for some reason left it mid way and I feel that isn't more
than a week effort remaining, If I remember correctly. I can be wrong..

Thoughts? What would be a good FS interface to support these requirements?


Ok, I might be biased because of a use case or the only use case that is
coming into my mind, but the FileName, the indexes of the row and
optionally the row data itself to prevent wrong data deletion for being on
the safe side, we can keep the third param optional. and may be an object
like RowDeletes, which takes the starting and ending index in the file and
the row data for it(optionally) Just for reference from where this is
coming, the code is at [1]

[1]
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/deletes/PositionDelete.java#L34-L38

BTW Thanx for sharing the details!!!

-Ayush


On Sat, 8 Oct 2022 at 05:58, Wei-Chiu Chuang <we...@apache.org> wrote:

> There were a number of discussions that happened during ApacheCon. In the
> spirit of the Apache Way, I am taking the conversation online, sharing with
> the larger community and also capturing requirements. Credits to Owen who
> started this discussion.
>
> There are a number of scenarios where users want to partially rewrite file
> blocks, and it would make sense to create a file system API to make these
> operations efficient.
>
> 1. Apache Iceberg or other evolvable table format.
> These table formats need to update table schema. The underlying files are
> rewritten but only a subset of blocks are changed. It would be much more
> efficient if a new file can be composed using some of the existing file
> blocks.
>
> 2. GDPR compliance "the right to erasure"
> Files must be rewritten to remove a person's data at request. Again, this
> is efficient because only a small set of file blocks is updated.
>
> 3. In-place erasure coding conversion.
> I had a proposal to support atomically rewriting replicated files into
> erasure coded files. This can be the building block to support
> auto-tiering.
>
> Thoughts? What would be a good FS interface to support these requirements?
>
> For Ozone folks, Ritesh opened a jira: HDDS-7297
> <https://issues.apache.org/jira/browse/HDDS-7297> but I figured a larger
> conversation should happen so that we can take into the consideration of
> other FS implementations.
>
> Thanks,
> Weichiu
>

Re: [DISCUSS] Supporting partial file rewrite/compose

Posted by Ayush Saxena <ay...@gmail.com>.
Hi Wei-Chiu,

I think this got lost somewhere or the discussion moved somewhere else, if
so please loop in me as well. Just guessing why everyone is so quiet :-)

1. Apache Iceberg or other evolvable table format.


 The first point that you mentioned regarding the iceberg, may be the use
case that you mentioned is not the same as what I can think of, but I still
want to rewrite actually the data files and if I can save on rewriting the
complete ones in Iceberg's copy on write mode, I feel that would lead to
read performance improvement as I can ditch  the Merge on Read mode & the
write performance won't suffer because I didn't re-write the entire file,
just removed some data from the actual data files rather than maintaining
in the delete file. I `feel` the performance should be more or less the
same as writing a delete file.

Maybe another use case could be Hive-Acid tables, can help in compactions &
those delete delta files and stuff like that, not going deep into that but
maybe...

2. GDPR compliance "the right to erasure"

That is even what my use case was also looking like just delete a record or
set of records. Just the records are stored in a Table

From the HDFS point of view I think this isn't naive but I still feel it is
doable, *Do you have pointers regarding if it is possible with the object
stores as well*, that is where my interest lies.

3. In-place erasure coding conversion.

Regarding the Erasure Coding In-Place conversion from Replicated files, If
I remember it correct, there was a branch created for it, some patches were
committed, playing with some header or so, I have some faint memory, the
only issue which I remember with the design was if someone is reading a
file which is replicated and it gets converted into ErasureCoded file and
that guy for some reason refethes the block, he would be failing, may be
some changes in the DFSInputStream to handle or move to
DFSStripedInputStream in such situations might have solved it, but I guess
folks chasing it for some reason left it mid way and I feel that isn't more
than a week effort remaining, If I remember correctly. I can be wrong..

Thoughts? What would be a good FS interface to support these requirements?


Ok, I might be biased because of a use case or the only use case that is
coming into my mind, but the FileName, the indexes of the row and
optionally the row data itself to prevent wrong data deletion for being on
the safe side, we can keep the third param optional. and may be an object
like RowDeletes, which takes the starting and ending index in the file and
the row data for it(optionally) Just for reference from where this is
coming, the code is at [1]

[1]
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/deletes/PositionDelete.java#L34-L38

BTW Thanx for sharing the details!!!

-Ayush


On Sat, 8 Oct 2022 at 05:58, Wei-Chiu Chuang <we...@apache.org> wrote:

> There were a number of discussions that happened during ApacheCon. In the
> spirit of the Apache Way, I am taking the conversation online, sharing with
> the larger community and also capturing requirements. Credits to Owen who
> started this discussion.
>
> There are a number of scenarios where users want to partially rewrite file
> blocks, and it would make sense to create a file system API to make these
> operations efficient.
>
> 1. Apache Iceberg or other evolvable table format.
> These table formats need to update table schema. The underlying files are
> rewritten but only a subset of blocks are changed. It would be much more
> efficient if a new file can be composed using some of the existing file
> blocks.
>
> 2. GDPR compliance "the right to erasure"
> Files must be rewritten to remove a person's data at request. Again, this
> is efficient because only a small set of file blocks is updated.
>
> 3. In-place erasure coding conversion.
> I had a proposal to support atomically rewriting replicated files into
> erasure coded files. This can be the building block to support
> auto-tiering.
>
> Thoughts? What would be a good FS interface to support these requirements?
>
> For Ozone folks, Ritesh opened a jira: HDDS-7297
> <https://issues.apache.org/jira/browse/HDDS-7297> but I figured a larger
> conversation should happen so that we can take into the consideration of
> other FS implementations.
>
> Thanks,
> Weichiu
>