You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Wilson, Huon (Data61, Eveleigh)" <Hu...@data61.csiro.au> on 2019/11/11 05:00:23 UTC

Deleting a (contiguous) subset of the columns in a row

We've got a data model where columns have a logical association, and this is encoded into the column qualifiers by having each group be a contiguous range of qualifiers. For instance, columns with first byte 0x00, 0x01, 0x02 or 0x03 form group A and columns with first byte 0x04 or 0x05 form group B.

We'd like to efficiently delete just group A from a row, while leaving everything in group B, which currently seems to require two steps: read the row to find the column qualifiers that exist in group A (we can use a ColumnRangeFilter to at least ignore everything in group B), and then doing a delete after .addColumns-ing those qualifiers.

Is there a better way to do this? For instance, a similar way to apply filters to a delete?

---
Huon Wilson
CSIRO | Data61
https://www.data61.csiro.au

Re: Deleting a (contiguous) subset of the columns in a row

Posted by "Wilson, Huon (Data61, Eveleigh)" <Hu...@data61.csiro.au>.

I think separate column families may not be the best for us, because we currently read group A and group B columns together in many instances (and, it would double the number of column families that we have, and we already have several), but it's definitely something we can keep in mind.

However, we can compute all possible group A columns in advance, so that's a very interesting idea. We might give it a go.

Thanks!

Huon

________________________________
From: Wellington Chevreuil <we...@gmail.com>
Sent: Monday, 11 November 2019 9:45 PM
To: Hbase-User <us...@hbase.apache.org>
Subject: Re: Deleting a (contiguous) subset of the columns in a row

I don't think you would have an easier way to do this without having to
redefine your table layout, so that you split these two groups into
separate column families, and apply this "classification" logic at
insertion time to determine which column family a given cell should go.

Another possibility, if you are able to calculate the possible column label
values in advance, is to add all possible column name values that should
get deleted into the "Delete" operation using "Delete.addColumns" method:
https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/client/Delete.html#addColumns(byte[],%20byte[])

Em seg, 11 de nov de 2019 às 05:00, Wilson, Huon (Data61, Eveleigh) <
Huon.Wilson@data61.csiro.au> escreveu:

> We've got a data model where columns have a logical association, and this
> is encoded into the column qualifiers by having each group be a contiguous
> range of qualifiers. For instance, columns with first byte 0x00, 0x01, 0x02
> or 0x03 form group A and columns with first byte 0x04 or 0x05 form group B.
>
> We'd like to efficiently delete just group A from a row, while leaving
> everything in group B, which currently seems to require two steps: read the
> row to find the column qualifiers that exist in group A (we can use a
> ColumnRangeFilter to at least ignore everything in group B), and then doing
> a delete after .addColumns-ing those qualifiers.
>
> Is there a better way to do this? For instance, a similar way to apply
> filters to a delete?
>
> ---
> Huon Wilson
> CSIRO | Data61
> https://www.data61.csiro.au

Re: Deleting a (contiguous) subset of the columns in a row

Posted by Wellington Chevreuil <we...@gmail.com>.

I don't think you would have an easier way to do this without having to
redefine your table layout, so that you split these two groups into
separate column families, and apply this "classification" logic at
insertion time to determine which column family a given cell should go.

Another possibility, if you are able to calculate the possible column label
values in advance, is to add all possible column name values that should
get deleted into the "Delete" operation using "Delete.addColumns" method:
https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/client/Delete.html#addColumns(byte[],%20byte[])

Em seg, 11 de nov de 2019 às 05:00, Wilson, Huon (Data61, Eveleigh) <
Huon.Wilson@data61.csiro.au> escreveu:

> We've got a data model where columns have a logical association, and this
> is encoded into the column qualifiers by having each group be a contiguous
> range of qualifiers. For instance, columns with first byte 0x00, 0x01, 0x02
> or 0x03 form group A and columns with first byte 0x04 or 0x05 form group B.
>
> We'd like to efficiently delete just group A from a row, while leaving
> everything in group B, which currently seems to require two steps: read the
> row to find the column qualifiers that exist in group A (we can use a
> ColumnRangeFilter to at least ignore everything in group B), and then doing
> a delete after .addColumns-ing those qualifiers.
>
> Is there a better way to do this? For instance, a similar way to apply
> filters to a delete?
>
> ---
> Huon Wilson
> CSIRO | Data61
> https://www.data61.csiro.au