You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2015/01/24 20:43:34 UTC

[jira] [Commented] (PHOENIX-1590) Add an Asynchronous/Deferred Delete Option

    [ https://issues.apache.org/jira/browse/PHOENIX-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290794#comment-14290794 ] 

James Taylor commented on PHOENIX-1590:
---------------------------------------

Thinking about this a bit more, I think I like your original idea better, [~jfernando_sfdc]: supporting a deferred delete of the data on a DROP VIEW command (if the view is updatable). Supporting it in-general on DELETE becomes very eventually-consistent-like and is very non standard. Though still non standard on the DROP statement, it's less important as there are frequently vendor-specific options on DDL commands.

Not sure of the best syntax, maybe one of these?
{code}
DROP VIEW foo DEFERRED DELETE
DROP VIEW foo DELETE ALL
DROP VIEW foo INCLUDE DATA
{code}

Not sure how best to handle corner cases, in particular, what happens if you create another VIEW with the same or overlapping WHERE clause before the data has actually been deleted? If the VIEW is tenant specific and we, by convention name the VIEW the same as the TABLE, then attempts to CREATE a new VIEW would fail until the data is actually deleted. That might be ok for our usage, but there's a lot of ifs there. :-)

Also, implementation-wise, not sure best how to track this. Maybe one way would be to mark the status of the view and it's indexes with a new value of DEFERRED_DELETE. Then, at compaction-time, we'd query the SYSTEM.CATALOG table to see if the table being compacted has views in a DEFERRED_DELETE state, collect up the view WHERE clauses, and generate a filter that could be used to evaluate if the row is included. We'd taking advantage of HBASE-12859 to know when we could remove the VIEW from the SYSTEM.CATALOG.

Thoughts, [~lhofhansl], [~jfernando_sfdc]? 

> Add an Asynchronous/Deferred Delete Option
> ------------------------------------------
>
>                 Key: PHOENIX-1590
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1590
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Jan Fernando
>
> For use cases where we need to delete very large amounts of data from Phoenix tables running a synchronous delete can be problematic. In order to guarantee that the delete completes, handle failure scenarios, and ensure it doesn't put too much load on the HBase cluster and crowd out other queries running we need to build tooling around the longer running delete operations to chunk them up, provide retries in the event of failures, and have ways to throttle delete load if the Region Servers get hot.  
> It would be really great if Phoenix offered a way to invoke a resilient delete that was processed asynchronously and had minimal load on the cluster. 
> An idea mentioned to implement this is to introduce a DEFERRED keyword to the DELETE operation and for such a delete to remove the data at compaction time.
> For our use cases, ideally, we would like to set delete filters that are based on the first 2 elements of the row key (a multi-tenant id and the next item).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)