You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Jonathan Allison-Pacheco <jp...@gmail.com> on 2023/02/27 19:44:53 UTC

Manually initiating compaction on correct tablet(s)

Hi, I'm working on a project where we are needing to initiate a compaction
upon deleting an entry or set of entries. Looking at the
TableOperations.compact method, it asks for `start` and `end` parameters,
which are meant to include all tablets contained between the ones where
those two rows exist, not including the start; such that (start, end].

This is a bit confusing to my interpretation, as if I need to run a
compaction where a single row was involved, the method suggests it will
compact the tablets between row+1 and row. Is that interpretation correct,
and if so, is that a proper way of utilizing this method?

If not, what is the intended way of performing this type of task?

Thank you,
Jonathan Allison-Pacheco
jpachecofs@gmail.com

Re: Manually initiating compaction on correct tablet(s)

Posted by Jonathan Allison-Pacheco <jp...@gmail.com>.
Ok, I think I've understood it from messing around a bit on my own with the
guidance here. My issue was thinking I needed to know the exact ID of the
row prior to the one I need compacted, which in my project's case is not
possible, but through some scan operations I've found a way that will work.

Thank you for the assistance!

On Mon, Feb 27, 2023 at 4:53 PM Ed Coleman <ed...@apache.org> wrote:

> Compactions work on splits / tablets so you will need to compact the
> tablet that contains the row.
>
> Tablets correspond to split points, and the rows between the split points
> are allocated to a single tablet. During a compaction the tablet will
> process all of the rows in that tablet and write a single file containing
> the rows for that tablet that pass the compaction filters and that do not
> contain a deletion marker.
>
> With a sample table id=3, name = ns1.tbl2 and split points at aaa, bbb,
> ccc, ddd, and remainder - I can examine the compaction counters by scanning
> the metadata table.
>
> scan -t accumulo.metadata -b 3; -e 3~ -c srv:compact
> 3;aaa srv:compact []    2
> 3;bbb srv:compact []    2
> 3;ccc srv:compact []    2
> 3;ddd srv:compact []    2
> 3< srv:compact []       2
>
> Then, run a compaction for split ccc using:
>
> compact -w -t ns1.tbl2 -b bbb~ -e ccc
>
> results in:
>
> scan -t accumulo.metadata -b 3; -e 3~ -c srv:compact
> 3;aaa srv:compact []    2
> 3;bbb srv:compact []    2
> 3;ccc srv:compact []    3
> 3;ddd srv:compact []    2
> 3< srv:compact []       2
>
> The begin row of the compaction (bbb~) specifies everything that sorts
> after after the split point  bbb.  This may help if you want to confirm
> that you compacted the correct range.
>
> On 2023/02/27 19:44:53 Jonathan Allison-Pacheco wrote:
> > Hi, I'm working on a project where we are needing to initiate a
> compaction
> > upon deleting an entry or set of entries. Looking at the
> > TableOperations.compact method, it asks for `start` and `end` parameters,
> > which are meant to include all tablets contained between the ones where
> > those two rows exist, not including the start; such that (start, end].
> >
> > This is a bit confusing to my interpretation, as if I need to run a
> > compaction where a single row was involved, the method suggests it will
> > compact the tablets between row+1 and row. Is that interpretation
> correct,
> > and if so, is that a proper way of utilizing this method?
> >
> > If not, what is the intended way of performing this type of task?
> >
> > Thank you,
> > Jonathan Allison-Pacheco
> > jpachecofs@gmail.com
> >
>

Re: Manually initiating compaction on correct tablet(s)

Posted by Ed Coleman <ed...@apache.org>.
Compactions work on splits / tablets so you will need to compact the tablet that contains the row. 

Tablets correspond to split points, and the rows between the split points are allocated to a single tablet. During a compaction the tablet will process all of the rows in that tablet and write a single file containing the rows for that tablet that pass the compaction filters and that do not contain a deletion marker.

With a sample table id=3, name = ns1.tbl2 and split points at aaa, bbb, ccc, ddd, and remainder - I can examine the compaction counters by scanning the metadata table.

scan -t accumulo.metadata -b 3; -e 3~ -c srv:compact
3;aaa srv:compact []	2
3;bbb srv:compact []	2
3;ccc srv:compact []	2
3;ddd srv:compact []	2
3< srv:compact []	2

Then, run a compaction for split ccc using:

compact -w -t ns1.tbl2 -b bbb~ -e ccc

results in:

scan -t accumulo.metadata -b 3; -e 3~ -c srv:compact
3;aaa srv:compact []	2
3;bbb srv:compact []	2
3;ccc srv:compact []	3
3;ddd srv:compact []	2
3< srv:compact []	2

The begin row of the compaction (bbb~) specifies everything that sorts after after the split point  bbb.  This may help if you want to confirm that you compacted the correct range.

On 2023/02/27 19:44:53 Jonathan Allison-Pacheco wrote:
> Hi, I'm working on a project where we are needing to initiate a compaction
> upon deleting an entry or set of entries. Looking at the
> TableOperations.compact method, it asks for `start` and `end` parameters,
> which are meant to include all tablets contained between the ones where
> those two rows exist, not including the start; such that (start, end].
> 
> This is a bit confusing to my interpretation, as if I need to run a
> compaction where a single row was involved, the method suggests it will
> compact the tablets between row+1 and row. Is that interpretation correct,
> and if so, is that a proper way of utilizing this method?
> 
> If not, what is the intended way of performing this type of task?
> 
> Thank you,
> Jonathan Allison-Pacheco
> jpachecofs@gmail.com
>