You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2018/10/24 17:52:00 UTC

[jira] [Commented] (PHOENIX-4344) MapReduce Delete Support

    [ https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662617#comment-16662617 ] 

Lars Hofhansl commented on PHOENIX-4344:
----------------------------------------

We just had a discussion around that. Can we do this?
 # Create input split as we do now. No change there.
 # In the map function, upon the _first row_ issue the equivalent of DELETE FROM <table> WHERE <pk> >= split_start AND pk < split_end AND <whatever select predicate was specified>
 # finish the map task after the first row

Now Phoenix can push the DELETE down into the region and be an order of magnitude or two faster compared to issuing point deletes.

A nice side effect is that if there's no data in a region we won't issue any work at all.

I think that's what James was saying in the first comment.

[~gjacoby], [~jisaac]

> MapReduce Delete Support
> ------------------------
>
>                 Key: PHOENIX-4344
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4344
>             Project: Phoenix
>          Issue Type: New Feature
>    Affects Versions: 4.12.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of long-running SELECTs. It would be really useful to have this capability for long-running DELETEs, particularly of tables with indexes where using HBase's own MapReduce integration would be prohibitively complicated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)