You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Devaraja Swami <de...@gmail.com> on 2014/12/24 22:34:31 UTC

setFilter for Delete operations?

Are there any plans for including a Filter for Delete?
Currently, the only way seems to be via checkAndDelete in HTable/Table.
This is helpful but does not cover all use cases.

For e.g., I use column qualifier prefixes as a sort of poor man's 2rd level
of indexing (i.e, 3 levels of indexing comprising row key --> column
qualifier prefix --> column qualifier suffix). This works well for Get and
Scan, since I can use a prefix column qualifier filter for the 2nd indexing
level.
However, I am not able to specify that an entire set of column qualifiers
sharing the same prefix should be deleted, without doing a Get first to
identify all the full column qualifier values with the same prefix, and
then adding those qualifiers to the Delete. This is obviously highly
inefficient.

checkAndDelete doesn't help here since it does not support prefix tests.
Moreover, I cannot just add a new column family for every unique column
qualifier prefix I need in my data model. In general, using just one column
family per table seems to be most efficient.

I can think of other use cases where one would need to delete a lot of
columns that match one of the available HBase filters, but whose exact
column qualifier values are not known at deletion time at the client.

All these uses cases can be taken care of by allowing Delete to support a
setFilter method, exactly as in the case of Get and Scan.

Re: setFilter for Delete operations?

Posted by Devaraja Swami <de...@gmail.com>.
Thanks, Ted. I can work around my problem by changing other aspects of my
application. Worst case, I can use the BulkDeleteEndpoint and batch up my
deletes like you said.
It's just that the lack of filters in the Delete makes me adjust my data
model and data access approaches often.
I understand that including Filter in the delete path is non-trivial. I
just hoped to bring this need to the attention of core committers so that
hopefully, it could be implemented sooner rather than later, maybe some
version of 1.x ;-)
Thanks, overall for your suggestion of BulkDeleteEndpoint. I didn't know
about it before.

On Wed, Dec 24, 2014 at 8:39 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. Using a scan for just one known row
>
> Can you batch some deletions in one invocation of the endpoint ?
>
> Supporting filter in the delete path requires non-trivial amount of work.
> So for the time being, please use BulkDeleteEndpoint.
>
> Cheers
>
> On Wed, Dec 24, 2014 at 6:23 PM, Devaraja Swami <de...@gmail.com>
> wrote:
>
> > Thanks for your reply, Ted. I looked into the coprocessor example you
> > provided. It will definitely address my specific need. However, two
> aspects
> > of this approach seem less than ideal to me:
> > 1. Being a coprocessor service, I believe the endpoint needs to be
> > pre-installed on the region servers. This is not possible in typical
> cases
> > where the user does not have influence over the HBase installation or
> > administrators.
> > 2. In my use case, I already know the row key for which I need the
> > specified column qualifier prefixes to be deleted. Using a scan for just
> > one known row, as in the coprocessor example, appears to be a bit of an
> > overkill...
> >
> > Overall, the coprocessor approach seems somewhat like using a hammer to
> > push in a pushpin. Specifying a filter from the client side is much
> easier
> > and more straightforward, IMHO.
> >
> >
> > On Wed, Dec 24, 2014 at 2:01 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Have you looked
> > > at
> > >
> >
> hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
> > > to see if it fits your need ?
> > >
> > > Cheers
> > >
> > > On Wed, Dec 24, 2014 at 1:34 PM, Devaraja Swami <
> devarajaswami@gmail.com
> > >
> > > wrote:
> > >
> > > > Are there any plans for including a Filter for Delete?
> > > > Currently, the only way seems to be via checkAndDelete in
> HTable/Table.
> > > > This is helpful but does not cover all use cases.
> > > >
> > > > For e.g., I use column qualifier prefixes as a sort of poor man's 2rd
> > > level
> > > > of indexing (i.e, 3 levels of indexing comprising row key --> column
> > > > qualifier prefix --> column qualifier suffix). This works well for
> Get
> > > and
> > > > Scan, since I can use a prefix column qualifier filter for the 2nd
> > > indexing
> > > > level.
> > > > However, I am not able to specify that an entire set of column
> > qualifiers
> > > > sharing the same prefix should be deleted, without doing a Get first
> to
> > > > identify all the full column qualifier values with the same prefix,
> and
> > > > then adding those qualifiers to the Delete. This is obviously highly
> > > > inefficient.
> > > >
> > > > checkAndDelete doesn't help here since it does not support prefix
> > tests.
> > > > Moreover, I cannot just add a new column family for every unique
> column
> > > > qualifier prefix I need in my data model. In general, using just one
> > > column
> > > > family per table seems to be most efficient.
> > > >
> > > > I can think of other use cases where one would need to delete a lot
> of
> > > > columns that match one of the available HBase filters, but whose
> exact
> > > > column qualifier values are not known at deletion time at the client.
> > > >
> > > > All these uses cases can be taken care of by allowing Delete to
> > support a
> > > > setFilter method, exactly as in the case of Get and Scan.
> > > >
> > >
> >
>

Re: setFilter for Delete operations?

Posted by Ted Yu <yu...@gmail.com>.
bq. Using a scan for just one known row

Can you batch some deletions in one invocation of the endpoint ?

Supporting filter in the delete path requires non-trivial amount of work.
So for the time being, please use BulkDeleteEndpoint.

Cheers

On Wed, Dec 24, 2014 at 6:23 PM, Devaraja Swami <de...@gmail.com>
wrote:

> Thanks for your reply, Ted. I looked into the coprocessor example you
> provided. It will definitely address my specific need. However, two aspects
> of this approach seem less than ideal to me:
> 1. Being a coprocessor service, I believe the endpoint needs to be
> pre-installed on the region servers. This is not possible in typical cases
> where the user does not have influence over the HBase installation or
> administrators.
> 2. In my use case, I already know the row key for which I need the
> specified column qualifier prefixes to be deleted. Using a scan for just
> one known row, as in the coprocessor example, appears to be a bit of an
> overkill...
>
> Overall, the coprocessor approach seems somewhat like using a hammer to
> push in a pushpin. Specifying a filter from the client side is much easier
> and more straightforward, IMHO.
>
>
> On Wed, Dec 24, 2014 at 2:01 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Have you looked
> > at
> >
> hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
> > to see if it fits your need ?
> >
> > Cheers
> >
> > On Wed, Dec 24, 2014 at 1:34 PM, Devaraja Swami <devarajaswami@gmail.com
> >
> > wrote:
> >
> > > Are there any plans for including a Filter for Delete?
> > > Currently, the only way seems to be via checkAndDelete in HTable/Table.
> > > This is helpful but does not cover all use cases.
> > >
> > > For e.g., I use column qualifier prefixes as a sort of poor man's 2rd
> > level
> > > of indexing (i.e, 3 levels of indexing comprising row key --> column
> > > qualifier prefix --> column qualifier suffix). This works well for Get
> > and
> > > Scan, since I can use a prefix column qualifier filter for the 2nd
> > indexing
> > > level.
> > > However, I am not able to specify that an entire set of column
> qualifiers
> > > sharing the same prefix should be deleted, without doing a Get first to
> > > identify all the full column qualifier values with the same prefix, and
> > > then adding those qualifiers to the Delete. This is obviously highly
> > > inefficient.
> > >
> > > checkAndDelete doesn't help here since it does not support prefix
> tests.
> > > Moreover, I cannot just add a new column family for every unique column
> > > qualifier prefix I need in my data model. In general, using just one
> > column
> > > family per table seems to be most efficient.
> > >
> > > I can think of other use cases where one would need to delete a lot of
> > > columns that match one of the available HBase filters, but whose exact
> > > column qualifier values are not known at deletion time at the client.
> > >
> > > All these uses cases can be taken care of by allowing Delete to
> support a
> > > setFilter method, exactly as in the case of Get and Scan.
> > >
> >
>

Re: setFilter for Delete operations?

Posted by Devaraja Swami <de...@gmail.com>.
Thanks for your reply, Ted. I looked into the coprocessor example you
provided. It will definitely address my specific need. However, two aspects
of this approach seem less than ideal to me:
1. Being a coprocessor service, I believe the endpoint needs to be
pre-installed on the region servers. This is not possible in typical cases
where the user does not have influence over the HBase installation or
administrators.
2. In my use case, I already know the row key for which I need the
specified column qualifier prefixes to be deleted. Using a scan for just
one known row, as in the coprocessor example, appears to be a bit of an
overkill...

Overall, the coprocessor approach seems somewhat like using a hammer to
push in a pushpin. Specifying a filter from the client side is much easier
and more straightforward, IMHO.


On Wed, Dec 24, 2014 at 2:01 PM, Ted Yu <yu...@gmail.com> wrote:

> Have you looked
> at
> hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
> to see if it fits your need ?
>
> Cheers
>
> On Wed, Dec 24, 2014 at 1:34 PM, Devaraja Swami <de...@gmail.com>
> wrote:
>
> > Are there any plans for including a Filter for Delete?
> > Currently, the only way seems to be via checkAndDelete in HTable/Table.
> > This is helpful but does not cover all use cases.
> >
> > For e.g., I use column qualifier prefixes as a sort of poor man's 2rd
> level
> > of indexing (i.e, 3 levels of indexing comprising row key --> column
> > qualifier prefix --> column qualifier suffix). This works well for Get
> and
> > Scan, since I can use a prefix column qualifier filter for the 2nd
> indexing
> > level.
> > However, I am not able to specify that an entire set of column qualifiers
> > sharing the same prefix should be deleted, without doing a Get first to
> > identify all the full column qualifier values with the same prefix, and
> > then adding those qualifiers to the Delete. This is obviously highly
> > inefficient.
> >
> > checkAndDelete doesn't help here since it does not support prefix tests.
> > Moreover, I cannot just add a new column family for every unique column
> > qualifier prefix I need in my data model. In general, using just one
> column
> > family per table seems to be most efficient.
> >
> > I can think of other use cases where one would need to delete a lot of
> > columns that match one of the available HBase filters, but whose exact
> > column qualifier values are not known at deletion time at the client.
> >
> > All these uses cases can be taken care of by allowing Delete to support a
> > setFilter method, exactly as in the case of Get and Scan.
> >
>

Re: setFilter for Delete operations?

Posted by Ted Yu <yu...@gmail.com>.
Have you looked
at hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java
to see if it fits your need ?

Cheers

On Wed, Dec 24, 2014 at 1:34 PM, Devaraja Swami <de...@gmail.com>
wrote:

> Are there any plans for including a Filter for Delete?
> Currently, the only way seems to be via checkAndDelete in HTable/Table.
> This is helpful but does not cover all use cases.
>
> For e.g., I use column qualifier prefixes as a sort of poor man's 2rd level
> of indexing (i.e, 3 levels of indexing comprising row key --> column
> qualifier prefix --> column qualifier suffix). This works well for Get and
> Scan, since I can use a prefix column qualifier filter for the 2nd indexing
> level.
> However, I am not able to specify that an entire set of column qualifiers
> sharing the same prefix should be deleted, without doing a Get first to
> identify all the full column qualifier values with the same prefix, and
> then adding those qualifiers to the Delete. This is obviously highly
> inefficient.
>
> checkAndDelete doesn't help here since it does not support prefix tests.
> Moreover, I cannot just add a new column family for every unique column
> qualifier prefix I need in my data model. In general, using just one column
> family per table seems to be most efficient.
>
> I can think of other use cases where one would need to delete a lot of
> columns that match one of the available HBase filters, but whose exact
> column qualifier values are not known at deletion time at the client.
>
> All these uses cases can be taken care of by allowing Delete to support a
> setFilter method, exactly as in the case of Get and Scan.
>